You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by igotux igotux <ig...@gmail.com> on 2014/07/31 09:04:25 UTC

(Unknown)

Hello Everyone,

Can someone help me explain what are the numbers next to Map 1 / Map 2 and
Reducer 3 .

~~~~~~~~~~~~~~~
Status: Running (application id: application_1404180111945_438880)

Map 1: -/- Map 2: -/- Reducer 3: 0/1
Map 1: 0/2 Map 2: -/- Reducer 3: 0/1
Map 1: 0/2 Map 2: 0/8 Reducer 3: 0/1
Map 1: 0/2 Map 2: 0/8 Reducer 3: 0/1
Map 1: 0/2 Map 2: 0/8 Reducer 3: 0/1
Map 1: 1/2 Map 2: 0/8 Reducer 3: 0/1
Map 1: 2/2 Map 2: 0/8 Reducer 3: 0/1
Map 1: 2/2 Map 2: 2/8 Reducer 3: 0/1
Map 1: 2/2 Map 2: 3/8 Reducer 3: 0/1
Map 1: 2/2 Map 2: 4/8 Reducer 3: 0/1
Map 1: 2/2 Map 2: 6/8 Reducer 3: 0/1
Map 1: 2/2 Map 2: 8/8 Reducer 3: 0/1
Map 1: 2/2 Map 2: 8/8 Reducer 3: 1/1
Status: Finished successfully
OK
~~~~~~~~~~~~~~~

The MR hive job runs with 16 mappers and one reducer.

Re:

Posted by Hitesh Shah <hi...@apache.org>.
There are multiple reasons for Tez having different no. of tasks:
    - Hive itself will behave differently. With MR, it may be have been processing data from 2 tables in the same map stage which affects no. of tasks. For Tez, it may end up processing each table in a separate vertex.
    - Tez does some level of grouping of input splits to run a smaller set of tasks depending on configured min/max size of data processed by a task
    - Furthermore, Tez looks at the available cluster capacity to decide how many tasks to run for a single vertex. For example, if a cluster has capacity to run only 10 containers at a time, Tez will try to at max 1.7 * 10 tasks ( 1.7 is a configurable value ). This holds true as long as the data max size upper bound is not crossed.

thanks
— Hitesh


On Jul 31, 2014, at 8:19 PM, igotux igotux <ig...@gmail.com> wrote:

> Thanks Hitesh. That explains the DAG.
> 
> When you said completed vs total tasks for a given vertex, does it mean, there was a total of 0/2 + 0/8 = 0/10 ( 10  tasks ) for this tez job.
> Which means, when i ran the same query in hive MR, it launched 16 tasks and now it is launching only 10 tasks. Also, can you please explain how the number of tasks got reduced here ?
> 
> Thanks.
> 
> 
> On Thu, Jul 31, 2014 at 9:20 PM, Hitesh Shah <hi...@apache.org> wrote:
> Hi
> 
> This looks like a 3-vertex DAG. It could be possibly be a linear DAG such as Map1 -> Map2 -> Reduce3 or a Join DAG where
> Map1 -> Reduce3 and Map2 -> Reduce3.
> 
> If you can get the application logs from YARN ( using bin/yarn logs -applicationId application_1404180111945_438880 ), you will be able to get a .dot file from the logs which will allow you to
> visualize the DAG using a tool like graphviz.
> 
> As for the console output, 0/2 or 0/8 just implies the no. of completed vs total tasks for a given vertex.
> 
> thanks
> — Hitesh
> 
> 
> On Jul 31, 2014, at 12:04 AM, igotux igotux <ig...@gmail.com> wrote:
> 
> > Hello Everyone,
> >
> > Can someone help me explain what are the numbers next to Map 1 / Map 2 and Reducer 3 .
> >
> > ~~~~~~~~~~~~~~~
> > Status: Running (application id: application_1404180111945_438880)
> >
> > Map 1: -/-    Map 2: -/-      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: -/-      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 1/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 2/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 3/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 4/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 6/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 8/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 8/8      Reducer 3: 1/1
> > Status: Finished successfully
> > OK
> > ~~~~~~~~~~~~~~~
> >
> > The MR hive job runs with 16 mappers and one reducer.
> 
> 


Re:

Posted by igotux igotux <ig...@gmail.com>.
Thanks Hitesh. That explains the DAG.

When you said completed vs total tasks for a given vertex, does it mean,
there was a total of 0/2 + 0/8 = 0/10 ( 10  tasks ) for this tez job.
Which means, when i ran the same query in hive MR, it launched 16 tasks and
now it is launching only 10 tasks. Also, can you please explain how the
number of tasks got reduced here ?

Thanks.


On Thu, Jul 31, 2014 at 9:20 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi
>
> This looks like a 3-vertex DAG. It could be possibly be a linear DAG such
> as Map1 -> Map2 -> Reduce3 or a Join DAG where
> Map1 -> Reduce3 and Map2 -> Reduce3.
>
> If you can get the application logs from YARN ( using bin/yarn logs
> -applicationId application_1404180111945_438880 ), you will be able to get
> a .dot file from the logs which will allow you to
> visualize the DAG using a tool like graphviz.
>
> As for the console output, 0/2 or 0/8 just implies the no. of completed vs
> total tasks for a given vertex.
>
> thanks
> — Hitesh
>
>
> On Jul 31, 2014, at 12:04 AM, igotux igotux <ig...@gmail.com> wrote:
>
> > Hello Everyone,
> >
> > Can someone help me explain what are the numbers next to Map 1 / Map 2
> and Reducer 3 .
> >
> > ~~~~~~~~~~~~~~~
> > Status: Running (application id: application_1404180111945_438880)
> >
> > Map 1: -/-    Map 2: -/-      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: -/-      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 1/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 2/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 3/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 4/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 6/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 8/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 8/8      Reducer 3: 1/1
> > Status: Finished successfully
> > OK
> > ~~~~~~~~~~~~~~~
> >
> > The MR hive job runs with 16 mappers and one reducer.
>
>

Re:

Posted by Hitesh Shah <hi...@apache.org>.
Hi 

This looks like a 3-vertex DAG. It could be possibly be a linear DAG such as Map1 -> Map2 -> Reduce3 or a Join DAG where
Map1 -> Reduce3 and Map2 -> Reduce3. 

If you can get the application logs from YARN ( using bin/yarn logs -applicationId application_1404180111945_438880 ), you will be able to get a .dot file from the logs which will allow you to 
visualize the DAG using a tool like graphviz.

As for the console output, 0/2 or 0/8 just implies the no. of completed vs total tasks for a given vertex. 

thanks
— Hitesh


On Jul 31, 2014, at 12:04 AM, igotux igotux <ig...@gmail.com> wrote:

> Hello Everyone,
> 
> Can someone help me explain what are the numbers next to Map 1 / Map 2 and Reducer 3 .
> 
> ~~~~~~~~~~~~~~~
> Status: Running (application id: application_1404180111945_438880)
> 
> Map 1: -/-	Map 2: -/-	Reducer 3: 0/1
> Map 1: 0/2	Map 2: -/-	Reducer 3: 0/1
> Map 1: 0/2	Map 2: 0/8	Reducer 3: 0/1
> Map 1: 0/2	Map 2: 0/8	Reducer 3: 0/1
> Map 1: 0/2	Map 2: 0/8	Reducer 3: 0/1
> Map 1: 1/2	Map 2: 0/8	Reducer 3: 0/1
> Map 1: 2/2	Map 2: 0/8	Reducer 3: 0/1
> Map 1: 2/2	Map 2: 2/8	Reducer 3: 0/1
> Map 1: 2/2	Map 2: 3/8	Reducer 3: 0/1
> Map 1: 2/2	Map 2: 4/8	Reducer 3: 0/1
> Map 1: 2/2	Map 2: 6/8	Reducer 3: 0/1
> Map 1: 2/2	Map 2: 8/8	Reducer 3: 0/1
> Map 1: 2/2	Map 2: 8/8	Reducer 3: 1/1
> Status: Finished successfully
> OK
> ~~~~~~~~~~~~~~~
> 
> The MR hive job runs with 16 mappers and one reducer.