You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Nitin Kumar <nk...@gmail.com> on 2016/04/25 10:27:38 UTC

Varying vcores/ram for hive queries running Tez engine

I was trying to benchmark some hive queries. I am using the tez execution
engine. I varied the values of the following properties:

   1.

   hive.tez.container.size
   2.

   tez.task.resource.memory.mb
   3.

   tez.task.resource.cpu.vcores

Changes in values for property 1 is reflected properly. However it seems
that hive does not respect changes in values of property 3; it always
allocates one vcore per requested container (RM is configured to use the
DominantResourceCalculator). This got me thinking about the precedence of
property values in hive and tez.

I have the following questions with respect to these configurations

   1.

   Does hive respect the set values for the properties 2 and 3 at all?
   2.

   If I set property 1 to a value say 2048 MB and property 2 is set to a
   value of say 1024 MB does this mean that I am wasting about a GB of memory
   for each spawned container?
   3.

   Is there a property in hive similar to property 1 that allows me to use
   the 'set' command in the .hql file to specify the number of vcores to use
   per container?
   4.

   Changes in value for the property tez.am.resource.cpu.vcores are
   reflected at runtime. However I do not observe the same behaviour with
   property 3. Are there other configurations that take precedence over it?

Your inputs and suggestions would be highly appreciated.

Thanks!


PS: Tests conducted on a 5 node cluster running HDP 2.3.0

Re: Varying vcores/ram for hive queries running Tez engine

Posted by Nitin Kumar <nk...@gmail.com>.
Thanks Bikas for clearing that up.
Much appreciated!

Regards,
Nitin

On Thu, May 5, 2016 at 1:21 PM, Bikas Saha <bi...@apache.org> wrote:

> Tez executor processes run 1 task at a time. While the inputs/outputs of
> these tasks may have parallel threads, they are mostly doing IO.
> Essentially the user code doing the processing is running on a single
> thread. Hence giving more cores does not change much unless the user
> processing code (ie. Hive operators in your case) can utilize that via
> running multi-threaded CPU intensive code. Hence performance gains for such
> (essentially single threaded apps) comes from task parallelism.
>
> DRF is a complex feature in YARN and with the primary design being ability
> to properly share CPU intensive tasks with non-CPU intensive tasks, such
> that CPU intensive tasks dont starve others out.
>
> Bikas
>
> ------------------------------
> Date: Thu, 5 May 2016 12:23:17 +0530
> Subject: Re: Varying vcores/ram for hive queries running Tez engine
> From: nk94.nitinkumar@gmail.com
> To: user@tez.apache.org
>
>
> Thanks Bikas and Hitesh for your inputs.
>
> I confirmed that hive.tez.cpu.vcores allocates desired number of vcores to
> task containers.
>
> I carried out my bench-marking experiments and observed that increasing
> the number of vcores allocated to a container did not have any noticeable
> impact on the overall completion time of the query.
>
> I have attached an excel sheet that documents the running times. I have
> also referenced the query
> <https://gist.github.com/NitinKumar94/fbca5d56caa6c150eaa4c8528a63252c> I
> used to benchmark. I have not done any optimization on the query side and
> just wanted to observe the impact of changing container sizes and vcores.
>
> I was on the HDP forum and it was said that parallelism in tez in achieved
> by means of individual tasks and increasing the cores would not help.
>
> Can you confirm this behavior?
>
> Thanks and regards,
> Nitin
>
> On Thu, May 5, 2016 at 11:34 AM, Hitesh Shah <hi...@apache.org> wrote:
>
> Bikas’ comment ( and mine below ) is relevant only for task specific
> settings. Hive does not override any settings for the Tez AM so the tez
> configs for the AM memory/vcores will reflect at runtime.
>
> I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which
> may be why your setting for (3) is not taking effect. Additionally, Hive
> also tends to fallback to MR based values if tez specific values are not
> specified which might be something else you may wish to ask on the Hive
> user list.
>
> thanks
> — Hitesh
>
>
> > On May 4, 2016, at 10:14 PM, Bikas Saha <bi...@apache.org> wrote:
> >
> > IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive
> config that is a proxy for 2).
> >
> > Bikas
> >
> > Date: Mon, 25 Apr 2016 13:57:38 +0530
> > Subject: Varying vcores/ram for hive queries running Tez engine
> > From: nk94.nitinkumar@gmail.com
> > To: user@hive.apache.org; user@tez.apache.org
> >
> > I was trying to benchmark some hive queries. I am using the tez
> execution engine. I varied the values of the following properties:
> >       • hive.tez.container.size
> >       • tez.task.resource.memory.mb
> >       • tez.task.resource.cpu.vcores
> > Changes in values for property 1 is reflected properly. However it seems
> that hive does not respect changes in values of property 3; it always
> allocates one vcore per requested container (RM is configured to use the
> DominantResourceCalculator). This got me thinking about the precedence of
> property values in hive and tez.
> > I have the following questions with respect to these configurations
> >       • Does hive respect the set values for the properties 2 and 3 at
> all?
> >       • If I set property 1 to a value say 2048 MB and property 2 is set
> to a value of say 1024 MB does this mean that I am wasting about a GB of
> memory for each spawned container?
> >       • Is there a property in hive similar to property 1 that allows me
> to use the 'set' command in the .hql file to specify the number of vcores
> to use per container?
> >       • Changes in value for the property tez.am.resource.cpu.vcores are
> reflected at runtime. However I do not observe the same behaviour with
> property 3. Are there other configurations that take precedence over it?
> > Your inputs and suggestions would be highly appreciated.
> >
> > Thanks!
> >
> >
> > PS: Tests conducted on a 5 node cluster running HDP 2.3.0
>
>
>

RE: Varying vcores/ram for hive queries running Tez engine

Posted by Bikas Saha <bi...@apache.org>.
Tez executor processes run 1 task at a time. While the inputs/outputs of these tasks may have parallel threads, they are mostly doing IO. Essentially the user code doing the processing is running on a single thread. Hence giving more cores does not change much unless the user processing code (ie. Hive operators in your case) can utilize that via running multi-threaded CPU intensive code. Hence performance gains for such (essentially single threaded apps) comes from task parallelism.
DRF is a complex feature in YARN and with the primary design being ability to properly share CPU intensive tasks with non-CPU intensive tasks, such that CPU intensive tasks dont starve others out.
Bikas
Date: Thu, 5 May 2016 12:23:17 +0530
Subject: Re: Varying vcores/ram for hive queries running Tez engine
From: nk94.nitinkumar@gmail.com
To: user@tez.apache.org

Thanks Bikas and Hitesh for your inputs.

I confirmed that hive.tez.cpu.vcores allocates desired number of vcores to task containers.

I carried out my bench-marking experiments and observed that increasing the number of vcores allocated to a container did not have any noticeable impact on the overall completion time of the query.

I have attached an excel sheet that documents the running times. I have 
also referenced the query I used to benchmark. I have not done any 
optimization on the query side and just wanted to observe the impact of 
changing container sizes and vcores.

I was on the HDP forum and it was said that parallelism in tez in achieved by means of individual tasks and increasing the cores would not help.

Can you confirm this behavior?

Thanks and regards,
Nitin 

On Thu, May 5, 2016 at 11:34 AM, Hitesh Shah <hi...@apache.org> wrote:
Bikas’ comment ( and mine below ) is relevant only for task specific settings. Hive does not override any settings for the Tez AM so the tez configs for the AM memory/vcores will reflect at runtime.



I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which may be why your setting for (3) is not taking effect. Additionally, Hive also tends to fallback to MR based values if tez specific values are not specified which might be something else you may wish to ask on the Hive user list.



thanks

— Hitesh





> On May 4, 2016, at 10:14 PM, Bikas Saha <bi...@apache.org> wrote:

>

> IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive config that is a proxy for 2).

>

> Bikas

>

> Date: Mon, 25 Apr 2016 13:57:38 +0530

> Subject: Varying vcores/ram for hive queries running Tez engine

> From: nk94.nitinkumar@gmail.com

> To: user@hive.apache.org; user@tez.apache.org

>

> I was trying to benchmark some hive queries. I am using the tez execution engine. I varied the values of the following properties:

>       • hive.tez.container.size

>       • tez.task.resource.memory.mb

>       • tez.task.resource.cpu.vcores

> Changes in values for property 1 is reflected properly. However it seems that hive does not respect changes in values of property 3; it always allocates one vcore per requested container (RM is configured to use the DominantResourceCalculator). This got me thinking about the precedence of property values in hive and tez.

> I have the following questions with respect to these configurations

>       • Does hive respect the set values for the properties 2 and 3 at all?

>       • If I set property 1 to a value say 2048 MB and property 2 is set to a value of say 1024 MB does this mean that I am wasting about a GB of memory for each spawned container?

>       • Is there a property in hive similar to property 1 that allows me to use the 'set' command in the .hql file to specify the number of vcores to use per container?

>       • Changes in value for the property tez.am.resource.cpu.vcores are reflected at runtime. However I do not observe the same behaviour with property 3. Are there other configurations that take precedence over it?

> Your inputs and suggestions would be highly appreciated.

>

> Thanks!

>

>

> PS: Tests conducted on a 5 node cluster running HDP 2.3.0




 		 	   		  

Re: Varying vcores/ram for hive queries running Tez engine

Posted by Nitin Kumar <nk...@gmail.com>.
Thanks Bikas and Hitesh for your inputs.

I confirmed that hive.tez.cpu.vcores allocates desired number of vcores to
task containers.

I carried out my bench-marking experiments and observed that increasing the
number of vcores allocated to a container did not have any noticeable
impact on the overall completion time of the query.

I have attached an excel sheet that documents the running times. I have
also referenced the query
<https://gist.github.com/NitinKumar94/fbca5d56caa6c150eaa4c8528a63252c> I
used to benchmark. I have not done any optimization on the query side and
just wanted to observe the impact of changing container sizes and vcores.

I was on the HDP forum and it was said that parallelism in tez in achieved
by means of individual tasks and increasing the cores would not help.

Can you confirm this behavior?

Thanks and regards,
Nitin

On Thu, May 5, 2016 at 11:34 AM, Hitesh Shah <hi...@apache.org> wrote:

> Bikas’ comment ( and mine below ) is relevant only for task specific
> settings. Hive does not override any settings for the Tez AM so the tez
> configs for the AM memory/vcores will reflect at runtime.
>
> I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which
> may be why your setting for (3) is not taking effect. Additionally, Hive
> also tends to fallback to MR based values if tez specific values are not
> specified which might be something else you may wish to ask on the Hive
> user list.
>
> thanks
> — Hitesh
>
>
> > On May 4, 2016, at 10:14 PM, Bikas Saha <bi...@apache.org> wrote:
> >
> > IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive
> config that is a proxy for 2).
> >
> > Bikas
> >
> > Date: Mon, 25 Apr 2016 13:57:38 +0530
> > Subject: Varying vcores/ram for hive queries running Tez engine
> > From: nk94.nitinkumar@gmail.com
> > To: user@hive.apache.org; user@tez.apache.org
> >
> > I was trying to benchmark some hive queries. I am using the tez
> execution engine. I varied the values of the following properties:
> >       • hive.tez.container.size
> >       • tez.task.resource.memory.mb
> >       • tez.task.resource.cpu.vcores
> > Changes in values for property 1 is reflected properly. However it seems
> that hive does not respect changes in values of property 3; it always
> allocates one vcore per requested container (RM is configured to use the
> DominantResourceCalculator). This got me thinking about the precedence of
> property values in hive and tez.
> > I have the following questions with respect to these configurations
> >       • Does hive respect the set values for the properties 2 and 3 at
> all?
> >       • If I set property 1 to a value say 2048 MB and property 2 is set
> to a value of say 1024 MB does this mean that I am wasting about a GB of
> memory for each spawned container?
> >       • Is there a property in hive similar to property 1 that allows me
> to use the 'set' command in the .hql file to specify the number of vcores
> to use per container?
> >       • Changes in value for the property tez.am.resource.cpu.vcores are
> reflected at runtime. However I do not observe the same behaviour with
> property 3. Are there other configurations that take precedence over it?
> > Your inputs and suggestions would be highly appreciated.
> >
> > Thanks!
> >
> >
> > PS: Tests conducted on a 5 node cluster running HDP 2.3.0
>
>

Re: Varying vcores/ram for hive queries running Tez engine

Posted by Hitesh Shah <hi...@apache.org>.
Bikas’ comment ( and mine below ) is relevant only for task specific settings. Hive does not override any settings for the Tez AM so the tez configs for the AM memory/vcores will reflect at runtime. 

I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which may be why your setting for (3) is not taking effect. Additionally, Hive also tends to fallback to MR based values if tez specific values are not specified which might be something else you may wish to ask on the Hive user list. 

thanks
— Hitesh


> On May 4, 2016, at 10:14 PM, Bikas Saha <bi...@apache.org> wrote:
> 
> IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive config that is a proxy for 2).
> 
> Bikas
> 
> Date: Mon, 25 Apr 2016 13:57:38 +0530
> Subject: Varying vcores/ram for hive queries running Tez engine
> From: nk94.nitinkumar@gmail.com
> To: user@hive.apache.org; user@tez.apache.org
> 
> I was trying to benchmark some hive queries. I am using the tez execution engine. I varied the values of the following properties:
> 	• hive.tez.container.size
> 	• tez.task.resource.memory.mb
> 	• tez.task.resource.cpu.vcores
> Changes in values for property 1 is reflected properly. However it seems that hive does not respect changes in values of property 3; it always allocates one vcore per requested container (RM is configured to use the DominantResourceCalculator). This got me thinking about the precedence of property values in hive and tez.
> I have the following questions with respect to these configurations
> 	• Does hive respect the set values for the properties 2 and 3 at all?
> 	• If I set property 1 to a value say 2048 MB and property 2 is set to a value of say 1024 MB does this mean that I am wasting about a GB of memory for each spawned container?
> 	• Is there a property in hive similar to property 1 that allows me to use the 'set' command in the .hql file to specify the number of vcores to use per container?
> 	• Changes in value for the property tez.am.resource.cpu.vcores are reflected at runtime. However I do not observe the same behaviour with property 3. Are there other configurations that take precedence over it?
> Your inputs and suggestions would be highly appreciated.
> 
> Thanks!
> 
> 
> PS: Tests conducted on a 5 node cluster running HDP 2.3.0


RE: Varying vcores/ram for hive queries running Tez engine

Posted by Bikas Saha <bi...@apache.org>.
IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive config that is a proxy for 2).
Bikas

Date: Mon, 25 Apr 2016 13:57:38 +0530
Subject: Varying vcores/ram for hive queries running Tez engine
From: nk94.nitinkumar@gmail.com
To: user@hive.apache.org; user@tez.apache.org



I was trying to benchmark some hive queries. I am using the tez 
execution engine. I varied the values of the following properties:


hive.tez.container.size
tez.task.resource.memory.mb
tez.task.resource.cpu.vcores


Changes in values for property 1 is reflected properly. However it 
seems that hive does not respect changes in values of property 3; it 
always allocates one vcore per requested container (RM is configured to 
use the DominantResourceCalculator). This got me thinking about the 
precedence of property values in hive and tez.


I have the following questions with respect to these configurations


Does hive respect the set values for the properties 2 and 3 at all?
If I set property 1 to a value say 2048 MB and property 2 is set 
to a value of say 1024 MB does this mean that I am wasting about a GB of
 memory for each spawned container?
Is there a property in hive similar to property 1 that allows me 
to use the 'set' command in the .hql file to specify the number of 
vcores to use per container?
Changes in value for the property tez.am.resource.cpu.vcores
 are reflected at runtime. However I do not observe the same behaviour 
with property 3. Are there other configurations that take precedence 
over it?


Your inputs and suggestions would be highly appreciated.

Thanks!


PS: Tests conducted on a 5 node cluster running HDP 2.3.0