You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Wojciech Langiewicz <wl...@gmail.com> on 2011/07/28 15:23:08 UTC
Hive 0.7 using only one mapper
Hello,
I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
0.7 (from CDHb4 to CDHu1).
No matter what query I'm running Hive is always using one mapper.
I have tried different queries with various sizes of input and ones with
many reducers or no reducers.
For version 0.5 everything worked correctly.
I'm attaching my hive-site.xml: https://gist.github.com/1111531
I have tested also jobs with Pig, and those jobs use multiple mappers -
so I guess this is a Hive issue.
Thank you for all your help.
--
Wojciech Langiewicz
Re: Hive 0.7 using only one mapper
Posted by Wojciech Langiewicz <wl...@gmail.com>.
Hello,
Thank you for your answers, this solves the issue.
I have set mapred.max.split.size to 1024000000 in hive-site.xml and jobs
are using appropriate number of mappers.
I have played a little with different configurations and
CombineHiveInputFormat gives better performance than HiveInputFormat in
my case.
Thanks again.
--
Wojciech Langiewicz
On 29.07.2011 05:43, Carl Steinbach wrote:
> Hi Wojciech,
>
> Vaibhav is correct. There's a configuration problem in the copy of
> hive-default.xml that ships with CDH3u1 which sets
> hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
> undefined. You can fix this problem by setting mapred.max.split.size in
> hive-default.xml to some reasonable value (it currently defaults
> to 256000000 on trunk).
>
> Sorry for the inconvenience.
>
> Carl
>
> On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhav<va...@amazon.com>wrote:
>
>> If you are using CombineHiveInputFormat it might be the case that all files
>> are being combined into one large split and hence 1 mapper gets created.**
>> **
>>
>> ** **
>>
>> If that is the case you can set the max split size in hive-default.xml
>> config file to create more splits and hence more map tasks:****
>>
>> ** **
>>
>> <property>****
>>
>> <name>mapred.max.split.size</name>****
>>
>> <value> 134217728</value>****
>>
>> <description>The maximum size chunk that map input should be split****
>>
>> into.</description>****
>>
>> </property>****
>>
>> ****
>>
>> Thanks****
>>
>> Vaibhav****
>>
>> ** **
>>
>> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> *Sent:* Thursday, July 28, 2011 7:10 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive 0.7 using only one mapper****
>>
>> ** **
>>
>> ** **
>>
>> On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz<
>> wlangiewicz@gmail.com> wrote:****
>>
>> Hello,
>> I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
>> 0.7 (from CDHb4 to CDHu1).
>>
>> No matter what query I'm running Hive is always using one mapper.
>> I have tried different queries with various sizes of input and ones with
>> many reducers or no reducers.
>>
>> For version 0.5 everything worked correctly.
>> I'm attaching my hive-site.xml: https://gist.github.com/1111531
>> I have tested also jobs with Pig, and those jobs use multiple mappers - so
>> I guess this is a Hive issue.
>>
>> Thank you for all your help.
>>
>> --
>> Wojciech Langiewicz****
>>
>>
>> You should also check that your hive-default.xml and other conf/ files is
>> up to 0.7.X. Having older versions of that file can lead to problems.
>>
>> Edward****
>>
>
Re: Hive 0.7 using only one mapper
Posted by Carl Steinbach <ca...@cloudera.com>.
Hi Wojciech,
Vaibhav is correct. There's a configuration problem in the copy of
hive-default.xml that ships with CDH3u1 which sets
hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
undefined. You can fix this problem by setting mapred.max.split.size in
hive-default.xml to some reasonable value (it currently defaults
to 256000000 on trunk).
Sorry for the inconvenience.
Carl
On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhav <va...@amazon.com>wrote:
> If you are using CombineHiveInputFormat it might be the case that all files
> are being combined into one large split and hence 1 mapper gets created.**
> **
>
> ** **
>
> If that is the case you can set the max split size in hive-default.xml
> config file to create more splits and hence more map tasks:****
>
> ** **
>
> <property>****
>
> <name>mapred.max.split.size</name>****
>
> <value> 134217728</value>****
>
> <description>The maximum size chunk that map input should be split****
>
> into. </description>****
>
> </property>****
>
> ****
>
> Thanks****
>
> Vaibhav****
>
> ** **
>
> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
> *Sent:* Thursday, July 28, 2011 7:10 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive 0.7 using only one mapper****
>
> ** **
>
> ** **
>
> On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz <
> wlangiewicz@gmail.com> wrote:****
>
> Hello,
> I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
> 0.7 (from CDHb4 to CDHu1).
>
> No matter what query I'm running Hive is always using one mapper.
> I have tried different queries with various sizes of input and ones with
> many reducers or no reducers.
>
> For version 0.5 everything worked correctly.
> I'm attaching my hive-site.xml: https://gist.github.com/1111531
> I have tested also jobs with Pig, and those jobs use multiple mappers - so
> I guess this is a Hive issue.
>
> Thank you for all your help.
>
> --
> Wojciech Langiewicz****
>
>
> You should also check that your hive-default.xml and other conf/ files is
> up to 0.7.X. Having older versions of that file can lead to problems.
>
> Edward****
>
RE: Hive 0.7 using only one mapper
Posted by "Aggarwal, Vaibhav" <va...@amazon.com>.
If you are using CombineHiveInputFormat it might be the case that all files are being combined into one large split and hence 1 mapper gets created.
If that is the case you can set the max split size in hive-default.xml config file to create more splits and hence more map tasks:
<property>
<name>mapred.max.split.size</name>
<value> 134217728</value>
<description>The maximum size chunk that map input should be split
into. </description>
</property>
Thanks
Vaibhav
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Thursday, July 28, 2011 7:10 AM
To: user@hive.apache.org
Subject: Re: Hive 0.7 using only one mapper
On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz <wl...@gmail.com>> wrote:
Hello,
I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive 0.7 (from CDHb4 to CDHu1).
No matter what query I'm running Hive is always using one mapper.
I have tried different queries with various sizes of input and ones with many reducers or no reducers.
For version 0.5 everything worked correctly.
I'm attaching my hive-site.xml: https://gist.github.com/1111531
I have tested also jobs with Pig, and those jobs use multiple mappers - so I guess this is a Hive issue.
Thank you for all your help.
--
Wojciech Langiewicz
You should also check that your hive-default.xml and other conf/ files is up to 0.7.X. Having older versions of that file can lead to problems.
Edward
Re: Hive 0.7 using only one mapper
Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz
<wl...@gmail.com>wrote:
> Hello,
> I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
> 0.7 (from CDHb4 to CDHu1).
>
> No matter what query I'm running Hive is always using one mapper.
> I have tried different queries with various sizes of input and ones with
> many reducers or no reducers.
>
> For version 0.5 everything worked correctly.
> I'm attaching my hive-site.xml: https://gist.github.com/**1111531<https://gist.github.com/1111531>
> I have tested also jobs with Pig, and those jobs use multiple mappers - so
> I guess this is a Hive issue.
>
> Thank you for all your help.
>
> --
> Wojciech Langiewicz
>
You should also check that your hive-default.xml and other conf/ files is up
to 0.7.X. Having older versions of that file can lead to problems.
Edward