You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Wojciech Langiewicz <wl...@gmail.com> on 2011/07/28 15:23:08 UTC

Hive 0.7 using only one mapper

Hello,
I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive 
0.7 (from CDHb4 to CDHu1).

No matter what query I'm running Hive is always using one mapper.
I have tried different queries with various sizes of input and ones with 
many reducers or no reducers.

For version 0.5 everything worked correctly.
I'm attaching my hive-site.xml: https://gist.github.com/1111531
I have tested also jobs with Pig, and those jobs use multiple mappers - 
so I guess this is a Hive issue.

Thank you for all your help.

--
Wojciech Langiewicz

Re: Hive 0.7 using only one mapper

Posted by Wojciech Langiewicz <wl...@gmail.com>.

Hello,
Thank you for your answers, this solves the issue.
I have set mapred.max.split.size to 1024000000 in hive-site.xml and jobs 
are using appropriate number of mappers.

I have played a little with different configurations and 
CombineHiveInputFormat gives better performance than HiveInputFormat in 
my case.

Thanks again.
--
Wojciech Langiewicz

On 29.07.2011 05:43, Carl Steinbach wrote:
> Hi Wojciech,
>
> Vaibhav is correct. There's a configuration problem in the copy of
> hive-default.xml that ships with CDH3u1 which sets
> hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
> undefined. You can fix this problem by setting mapred.max.split.size in
> hive-default.xml to some reasonable value (it currently defaults
> to 256000000 on trunk).
>
> Sorry for the inconvenience.
>
> Carl
>
> On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhav<va...@amazon.com>wrote:
>
>> If you are using CombineHiveInputFormat it might be the case that all files
>> are being combined into one large split and hence 1 mapper gets created.**
>> **
>>
>> ** **
>>
>> If that is the case you can set the max split size in hive-default.xml
>> config file to create more splits and hence more map tasks:****
>>
>> ** **
>>
>> <property>****
>>
>>    <name>mapred.max.split.size</name>****
>>
>>    <value>  134217728</value>****
>>
>>    <description>The maximum size chunk that map input should be split****
>>
>>    into.</description>****
>>
>> </property>****
>>
>> ****
>>
>> Thanks****
>>
>> Vaibhav****
>>
>> ** **
>>
>> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> *Sent:* Thursday, July 28, 2011 7:10 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive 0.7 using only one mapper****
>>
>> ** **
>>
>> ** **
>>
>> On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz<
>> wlangiewicz@gmail.com>  wrote:****
>>
>> Hello,
>> I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
>> 0.7 (from CDHb4 to CDHu1).
>>
>> No matter what query I'm running Hive is always using one mapper.
>> I have tried different queries with various sizes of input and ones with
>> many reducers or no reducers.
>>
>> For version 0.5 everything worked correctly.
>> I'm attaching my hive-site.xml: https://gist.github.com/1111531
>> I have tested also jobs with Pig, and those jobs use multiple mappers - so
>> I guess this is a Hive issue.
>>
>> Thank you for all your help.
>>
>> --
>> Wojciech Langiewicz****
>>
>>
>> You should also check that your hive-default.xml and other conf/ files is
>> up to 0.7.X. Having older versions of that file can lead to problems.
>>
>> Edward****
>>
>

Re: Hive 0.7 using only one mapper

Posted by Carl Steinbach <ca...@cloudera.com>.

Hi Wojciech,

Vaibhav is correct. There's a configuration problem in the copy of
hive-default.xml that ships with CDH3u1 which sets
hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
undefined. You can fix this problem by setting mapred.max.split.size in
hive-default.xml to some reasonable value (it currently defaults
to 256000000 on trunk).

Sorry for the inconvenience.

Carl

On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhav <va...@amazon.com>wrote:

> If you are using CombineHiveInputFormat it might be the case that all files
> are being combined into one large split and hence 1 mapper gets created.**
> **
>
> ** **
>
> If that is the case you can set the max split size in hive-default.xml
> config file to create more splits and hence more map tasks:****
>
> ** **
>
> <property>****
>
>   <name>mapred.max.split.size</name>****
>
>   <value> 134217728</value>****
>
>   <description>The maximum size chunk that map input should be split****
>
>   into.  </description>****
>
> </property>****
>
> ****
>
> Thanks****
>
> Vaibhav****
>
> ** **
>
> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
> *Sent:* Thursday, July 28, 2011 7:10 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive 0.7 using only one mapper****
>
> ** **
>
> ** **
>
> On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz <
> wlangiewicz@gmail.com> wrote:****
>
> Hello,
> I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
> 0.7 (from CDHb4 to CDHu1).
>
> No matter what query I'm running Hive is always using one mapper.
> I have tried different queries with various sizes of input and ones with
> many reducers or no reducers.
>
> For version 0.5 everything worked correctly.
> I'm attaching my hive-site.xml: https://gist.github.com/1111531
> I have tested also jobs with Pig, and those jobs use multiple mappers - so
> I guess this is a Hive issue.
>
> Thank you for all your help.
>
> --
> Wojciech Langiewicz****
>
>
> You should also check that your hive-default.xml and other conf/ files is
> up to 0.7.X. Having older versions of that file can lead to problems.
>
> Edward****
>

RE: Hive 0.7 using only one mapper

Posted by "Aggarwal, Vaibhav" <va...@amazon.com>.

If you are using CombineHiveInputFormat it might be the case that all files are being combined into one large split and hence 1 mapper gets created.

If that is the case you can set the max split size in hive-default.xml config file to create more splits and hence more map tasks:

<property>
  <name>mapred.max.split.size</name>
  <value> 134217728</value>
  <description>The maximum size chunk that map input should be split
  into.  </description>
</property>
Thanks
Vaibhav

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Thursday, July 28, 2011 7:10 AM
To: user@hive.apache.org
Subject: Re: Hive 0.7 using only one mapper


On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz <wl...@gmail.com>> wrote:
Hello,
I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive 0.7 (from CDHb4 to CDHu1).

No matter what query I'm running Hive is always using one mapper.
I have tried different queries with various sizes of input and ones with many reducers or no reducers.

For version 0.5 everything worked correctly.
I'm attaching my hive-site.xml: https://gist.github.com/1111531
I have tested also jobs with Pig, and those jobs use multiple mappers - so I guess this is a Hive issue.

Thank you for all your help.

--
Wojciech Langiewicz

You should also check that your hive-default.xml and other conf/ files is up to 0.7.X. Having older versions of that file can lead to problems.

Edward

Re: Hive 0.7 using only one mapper

Posted by Edward Capriolo <ed...@gmail.com>.

On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz
<wl...@gmail.com>wrote:

> Hello,
> I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
> 0.7 (from CDHb4 to CDHu1).
>
> No matter what query I'm running Hive is always using one mapper.
> I have tried different queries with various sizes of input and ones with
> many reducers or no reducers.
>
> For version 0.5 everything worked correctly.
> I'm attaching my hive-site.xml: https://gist.github.com/**1111531<https://gist.github.com/1111531>
> I have tested also jobs with Pig, and those jobs use multiple mappers - so
> I guess this is a Hive issue.
>
> Thank you for all your help.
>
> --
> Wojciech Langiewicz
>

You should also check that your hive-default.xml and other conf/ files is up
to 0.7.X. Having older versions of that file can lead to problems.

Edward