You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by wd <wd...@wdicc.com> on 2010/06/09 05:20:57 UTC
set mapred.map.tasks=1 not work
hi,
I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli,
but seemes it doesn't work, total map tasks still over 300+.
Is this a svn version problem?
Re: set mapred.map.tasks=1 not work
Posted by wd <wd...@wdicc.com>.
I've tried jvm reuse, useless too..
Total time is about 130s, data only 10M and all small files, 2 nodes.
hive/hadoop will run 350+ maps ...
2010/6/10 Edward Capriolo <ed...@gmail.com>
> Also consider setting up jvm reuse this will deal with some mapper
> startup penalty.
>
> How long is you query taking how much data is there? How many nodes?
>
> On Thursday, June 10, 2010, wd <wd...@wdicc.com> wrote:
> > set
> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> >
> > and
> >
> > set hive.merge.size.per.task=1000000;
> > set hive.merge.mapfiles=true;
> >
> > seames all useless here, time token for execute 'select a, count(1) from
> t1 group by a' is almost the same.
> >
> > Have I missed some other settings ?
> >
> > 2010/6/10 wd <wd...@wdicc.com>
> >
> > Thanks everyone, I'll try CombineHiveInputFormat. :)
> >
> > 2010/6/10 Namit Jain <nj...@facebook.com>
> >
> >
> > CombineHiveInputFormat
> >
> >
>
Re: set mapred.map.tasks=1 not work
Posted by Edward Capriolo <ed...@gmail.com>.
Also consider setting up jvm reuse this will deal with some mapper
startup penalty.
How long is you query taking how much data is there? How many nodes?
On Thursday, June 10, 2010, wd <wd...@wdicc.com> wrote:
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
>
> and
>
> set hive.merge.size.per.task=1000000;
> set hive.merge.mapfiles=true;
>
> seames all useless here, time token for execute 'select a, count(1) from t1 group by a' is almost the same.
>
> Have I missed some other settings ?
>
> 2010/6/10 wd <wd...@wdicc.com>
>
> Thanks everyone, I'll try CombineHiveInputFormat. :)
>
> 2010/6/10 Namit Jain <nj...@facebook.com>
>
>
> CombineHiveInputFormat
>
>
Re: set mapred.map.tasks=1 not work
Posted by wd <wd...@wdicc.com>.
set *hive.input.format=*org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
and
set hive.merge.size.per.task=1000000;
set hive.merge.mapfiles=true;
seames all useless here, time token for execute 'select a, count(1) from t1
group by a' is almost the same.
Have I missed some other settings ?
2010/6/10 wd <wd...@wdicc.com>
> Thanks everyone, I'll try CombineHiveInputFormat. :)
>
> 2010/6/10 Namit Jain <nj...@facebook.com>
>
>> CombineHiveInputFormat
>>
>
>
Re: set mapred.map.tasks=1 not work
Posted by wd <wd...@wdicc.com>.
Thanks everyone, I'll try CombineHiveInputFormat. :)
2010/6/10 Namit Jain <nj...@facebook.com>
> CombineHiveInputFormat
>
RE: set mapred.map.tasks=1 not work
Posted by Namit Jain <nj...@facebook.com>.
use CombineHiveInputFormat
check your hive.input.format
________________________________________
From: Alex Kozlov [alexvk@cloudera.com]
Sent: Wednesday, June 09, 2010 9:15 PM
To: hive-user@hadoop.apache.org
Subject: Re: set mapred.map.tasks=1 not work
Hi Wd,
Try:
hive.merge.mapfiles=true
hive.merge.size.per.task=1000000 (or some other large number)
Alex K
On Wed, Jun 9, 2010 at 6:55 PM, wd <wd...@wdicc.com>> wrote:
I have lots of small files in hive, the mapred is too slow .... Is there a way to improve the speed ?
2010/6/10 Edward Capriolo <ed...@gmail.com>>
On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com>> wrote:
I've tried hive 0.5, the option not work too.
And find this page[http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] via google.
2010/6/9 wd <wd...@wdicc.com>>
hi,
I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli, but seemes it doesn't work, total map tasks still over 300+.
Is this a svn version problem?
You answered your own question, look in the link
"You cannot force mapred.map.tasks but can specify mapred.reduce.tasks. "
Map tasks is based on the number of input files and folders. Even though hive uses a CombinedInput format you still can get a number of mappers.
Edward
Re: set mapred.map.tasks=1 not work
Posted by Alex Kozlov <al...@cloudera.com>.
Hi Wd,
Try:
*hive.merge.mapfiles*=true
*hive.merge.size.per.task*=1000000 (or some other large number)
Alex K
On Wed, Jun 9, 2010 at 6:55 PM, wd <wd...@wdicc.com> wrote:
> I have lots of small files in hive, the mapred is too slow .... Is there a
> way to improve the speed ?
>
> 2010/6/10 Edward Capriolo <ed...@gmail.com>
>
>>
>>
>> On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:
>>
>>> I've tried hive 0.5, the option not work too.
>>> And find this page[
>>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>>> via google.
>>>
>>> 2010/6/9 wd <wd...@wdicc.com>
>>>
>>> hi,
>>>>
>>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>>
>>>> Is this a svn version problem?
>>>>
>>>
>>>
>> You answered your own question, look in the link
>>
>> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
>> "
>>
>> Map tasks is based on the number of input files and folders. Even though
>> hive uses a CombinedInput format you still can get a number of mappers.
>>
>> Edward
>>
>
>
Re: set mapred.map.tasks=1 not work
Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Jun 9, 2010 at 9:55 PM, wd <wd...@wdicc.com> wrote:
> I have lots of small files in hive, the mapred is too slow .... Is there a
> way to improve the speed ?
>
> 2010/6/10 Edward Capriolo <ed...@gmail.com>
>
>
>>
>> On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:
>>
>>> I've tried hive 0.5, the option not work too.
>>> And find this page[
>>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>>> via google.
>>>
>>> 2010/6/9 wd <wd...@wdicc.com>
>>>
>>> hi,
>>>>
>>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>>
>>>> Is this a svn version problem?
>>>>
>>>
>>>
>> You answered your own question, look in the link
>>
>> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
>> "
>>
>> Map tasks is based on the number of input files and folders. Even though
>> hive uses a CombinedInput format you still can get a number of mappers.
>>
>> Edward
>>
>
> With hadoop 20 and the Combine InputFormat you should get fairly decent
performance even with many small files. My current employer is about to open
source FileCrusher, a stand alone and map reduce application that merges
Text and Sequence files into one big one. So if you hang tight for a couple
days a can point you at a utility that might help.
Re: set mapred.map.tasks=1 not work
Posted by wd <wd...@wdicc.com>.
I have lots of small files in hive, the mapred is too slow .... Is there a
way to improve the speed ?
2010/6/10 Edward Capriolo <ed...@gmail.com>
>
>
> On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:
>
>> I've tried hive 0.5, the option not work too.
>> And find this page[
>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>> via google.
>>
>> 2010/6/9 wd <wd...@wdicc.com>
>>
>> hi,
>>>
>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>
>>> Is this a svn version problem?
>>>
>>
>>
> You answered your own question, look in the link
>
> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
> "
>
> Map tasks is based on the number of input files and folders. Even though
> hive uses a CombinedInput format you still can get a number of mappers.
>
> Edward
>
Re: set mapred.map.tasks=1 not work
Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:
> I've tried hive 0.5, the option not work too.
> And find this page[
> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
> via google.
>
> 2010/6/9 wd <wd...@wdicc.com>
>
> hi,
>>
>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>
>> Is this a svn version problem?
>>
>
>
You answered your own question, look in the link
"You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks. "
Map tasks is based on the number of input files and folders. Even though
hive uses a CombinedInput format you still can get a number of mappers.
Edward
Re: set mapred.map.tasks=1 not work
Posted by wd <wd...@wdicc.com>.
I've tried hive 0.5, the option not work too.
And find this page[
http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
via google.
2010/6/9 wd <wd...@wdicc.com>
> hi,
>
> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
> cli, but seemes it doesn't work, total map tasks still over 300+.
>
> Is this a svn version problem?
>