You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Edward Capriolo <ed...@gmail.com> on 2010/08/09 23:22:03 UTC

Hive local mode on by default for 0.6.0

I already caugh someone on IRC who was very surprised by the local
mode in hive trunk. Is local mode on by default?

Do you think the release 0.6.0 should have this on by default? There
have been a few issues like HIVE-1520, and it seems like letting this
out in the wild without actively turning it on might find edge cases
and complications.

Regards,
Edward

RE: Hive local mode on by default for 0.6.0

Posted by Joydeep Sen Sarma <js...@facebook.com>.
> I personally feel like it would be better to turn this off.

i agree. unless there's contrary opinion - i am just going to turn it off by default  in with some upcoming diff/checkin (and reply to this thread again)

(just realized that the title of this thread is misleading ..)
_______________________________________
From: Edward Capriolo [edlinuxguru@gmail.com]
Sent: Monday, August 09, 2010 8:04 PM
To: hive-dev@hadoop.apache.org
Subject: Re: Hive local mode on by default for 0.6.0

On Mon, Aug 9, 2010 at 9:28 PM, Joydeep Sen Sarma <js...@facebook.com> wrote:
> We enabled a feature called 'auto-local mode' (hive-1408). The query processor looks at the size of the input and decides dynamically whether local mode execution can be done. The determination is done on a per job level for a multi-job query.
>
> We enabled it by default in trunk so it can get some coverage. Local mode support in 0.6 has some bugs (in fact a big part of this jira was a comprehensive test for local mode and small fixes for the bugs that this uncovered). The relevant option is:
>
> set hive.exec.mode.local.auto=<true/false>
>
>
> I have been a little worried about enabling this by default - we can turn it off if required. The case that worries me the most is if a lot of users refer to scripts (via transform clauses) that are only available in the cluster nodes and not in the client node. Another assumption is that mapred.local.dir is set to a value valid on the client side (which may not be the case if the same hadoop config is being shared across client and server side).
>
> Promise to add some documentation on the wiki about this ASAP.
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Monday, August 09, 2010 2:22 PM
> To: <hi...@hadoop.apache.org>
> Subject: Hive local mode on by default for 0.6.0
>
> I already caugh someone on IRC who was very surprised by the local
> mode in hive trunk. Is local mode on by default?
>
> Do you think the release 0.6.0 should have this on by default? There
> have been a few issues like HIVE-1520, and it seems like letting this
> out in the wild without actively turning it on might find edge cases
> and complications.
>
> Regards,
> Edward
>

Another thing is some people launch jobs from machines without
multiple-disks and cores, sometimes even from the namenode. I think on
these machines local performance will be poor or even dangerous. e.g.
OOM crashes name node and corrupts FSimage.

Someone came to the IRC and wondered why jobs were not showing in the
job tracker. I personally feel like it would be better to turn this
off. People that know about it and want it on can set it to true.

It is a super cool feature though.

Re: Hive local mode on by default for 0.6.0

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Aug 9, 2010 at 9:28 PM, Joydeep Sen Sarma <js...@facebook.com> wrote:
> We enabled a feature called 'auto-local mode' (hive-1408). The query processor looks at the size of the input and decides dynamically whether local mode execution can be done. The determination is done on a per job level for a multi-job query.
>
> We enabled it by default in trunk so it can get some coverage. Local mode support in 0.6 has some bugs (in fact a big part of this jira was a comprehensive test for local mode and small fixes for the bugs that this uncovered). The relevant option is:
>
> set hive.exec.mode.local.auto=<true/false>
>
>
> I have been a little worried about enabling this by default - we can turn it off if required. The case that worries me the most is if a lot of users refer to scripts (via transform clauses) that are only available in the cluster nodes and not in the client node. Another assumption is that mapred.local.dir is set to a value valid on the client side (which may not be the case if the same hadoop config is being shared across client and server side).
>
> Promise to add some documentation on the wiki about this ASAP.
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Monday, August 09, 2010 2:22 PM
> To: <hi...@hadoop.apache.org>
> Subject: Hive local mode on by default for 0.6.0
>
> I already caugh someone on IRC who was very surprised by the local
> mode in hive trunk. Is local mode on by default?
>
> Do you think the release 0.6.0 should have this on by default? There
> have been a few issues like HIVE-1520, and it seems like letting this
> out in the wild without actively turning it on might find edge cases
> and complications.
>
> Regards,
> Edward
>

Another thing is some people launch jobs from machines without
multiple-disks and cores, sometimes even from the namenode. I think on
these machines local performance will be poor or even dangerous. e.g.
OOM crashes name node and corrupts FSimage.

Someone came to the IRC and wondered why jobs were not showing in the
job tracker. I personally feel like it would be better to turn this
off. People that know about it and want it on can set it to true.

It is a super cool feature though.

RE: Hive local mode on by default for 0.6.0

Posted by Joydeep Sen Sarma <js...@facebook.com>.
We enabled a feature called 'auto-local mode' (hive-1408). The query processor looks at the size of the input and decides dynamically whether local mode execution can be done. The determination is done on a per job level for a multi-job query.

We enabled it by default in trunk so it can get some coverage. Local mode support in 0.6 has some bugs (in fact a big part of this jira was a comprehensive test for local mode and small fixes for the bugs that this uncovered). The relevant option is:

set hive.exec.mode.local.auto=<true/false>


I have been a little worried about enabling this by default - we can turn it off if required. The case that worries me the most is if a lot of users refer to scripts (via transform clauses) that are only available in the cluster nodes and not in the client node. Another assumption is that mapred.local.dir is set to a value valid on the client side (which may not be the case if the same hadoop config is being shared across client and server side).

Promise to add some documentation on the wiki about this ASAP.

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com] 
Sent: Monday, August 09, 2010 2:22 PM
To: <hi...@hadoop.apache.org>
Subject: Hive local mode on by default for 0.6.0

I already caugh someone on IRC who was very surprised by the local
mode in hive trunk. Is local mode on by default?

Do you think the release 0.6.0 should have this on by default? There
have been a few issues like HIVE-1520, and it seems like letting this
out in the wild without actively turning it on might find edge cases
and complications.

Regards,
Edward