You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by wd <wd...@wdicc.com> on 2010/06/09 05:20:57 UTC

set mapred.map.tasks=1 not work

hi,

I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli,
but seemes it doesn't work, total map tasks still over 300+.

Is this a svn version problem?

Re: set mapred.map.tasks=1 not work

Posted by wd <wd...@wdicc.com>.

I've tried jvm reuse, useless too..

Total time is about 130s, data only 10M and all small files, 2 nodes.

hive/hadoop will run 350+ maps ...

2010/6/10 Edward Capriolo <ed...@gmail.com>

> Also consider setting up jvm reuse this will deal with some mapper
> startup penalty.
>
> How long is you query taking how much data is there? How many nodes?
>
> On Thursday, June 10, 2010, wd <wd...@wdicc.com> wrote:
> > set
> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> >
> > and
> >
> > set hive.merge.size.per.task=1000000;
> > set hive.merge.mapfiles=true;
> >
> > seames all useless here, time token for execute 'select a, count(1) from
> t1 group by a' is almost the same.
> >
> > Have I missed some other settings ?
> >
> > 2010/6/10 wd <wd...@wdicc.com>
> >
> > Thanks everyone, I'll try CombineHiveInputFormat. :)
> >
> > 2010/6/10 Namit Jain <nj...@facebook.com>
> >
> >
> > CombineHiveInputFormat
> >
> >
>

Re: set mapred.map.tasks=1 not work

Posted by Edward Capriolo <ed...@gmail.com>.

Also consider setting up jvm reuse this will deal with some mapper
startup penalty.

How long is you query taking how much data is there? How many nodes?

On Thursday, June 10, 2010, wd <wd...@wdicc.com> wrote:
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
>
> and
>
> set hive.merge.size.per.task=1000000;
> set hive.merge.mapfiles=true;
>
> seames all useless here, time token for execute 'select a, count(1) from t1 group by a' is almost the same.
>
> Have I missed some other settings ?
>
> 2010/6/10 wd <wd...@wdicc.com>
>
> Thanks everyone, I'll try CombineHiveInputFormat. :)
>
> 2010/6/10 Namit Jain <nj...@facebook.com>
>
>
> CombineHiveInputFormat
>
>

Re: set mapred.map.tasks=1 not work

Posted by wd <wd...@wdicc.com>.

set *hive.input.format=*org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

and

set hive.merge.size.per.task=1000000;
set hive.merge.mapfiles=true;

seames all useless here, time token for execute 'select a, count(1) from t1
group by a' is almost the same.

Have I missed some other settings ?

2010/6/10 wd <wd...@wdicc.com>

> Thanks everyone, I'll try CombineHiveInputFormat. :)
>
> 2010/6/10 Namit Jain <nj...@facebook.com>
>
>> CombineHiveInputFormat
>>
>
>

Re: set mapred.map.tasks=1 not work

Posted by wd <wd...@wdicc.com>.

Thanks everyone, I'll try CombineHiveInputFormat. :)

2010/6/10 Namit Jain <nj...@facebook.com>

> CombineHiveInputFormat
>

RE: set mapred.map.tasks=1 not work

Posted by Namit Jain <nj...@facebook.com>.

use CombineHiveInputFormat

check your hive.input.format

________________________________________
From: Alex Kozlov [alexvk@cloudera.com]
Sent: Wednesday, June 09, 2010 9:15 PM
To: hive-user@hadoop.apache.org
Subject: Re: set mapred.map.tasks=1 not work

Hi Wd,

Try:

hive.merge.mapfiles=true
hive.merge.size.per.task=1000000 (or some other large number)

Alex K

On Wed, Jun 9, 2010 at 6:55 PM, wd <wd...@wdicc.com>> wrote:
I have lots of small files in hive, the mapred is too slow .... Is there a way to improve the speed ?

2010/6/10 Edward Capriolo <ed...@gmail.com>>

On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com>> wrote:
I've tried hive 0.5, the option not work too.
And find this page[http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] via google.

2010/6/9 wd <wd...@wdicc.com>>

hi,

I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli, but seemes it doesn't work, total map tasks still over 300+.

Is this a svn version problem?

You answered your own question, look in the link

"You cannot force mapred.map.tasks but can specify mapred.reduce.tasks. "

Map tasks is based on the number of input files and folders. Even though hive uses a CombinedInput format you still can get a number of mappers.

Edward

Re: set mapred.map.tasks=1 not work

Posted by Alex Kozlov <al...@cloudera.com>.

Hi Wd,

Try:

*hive.merge.mapfiles*=true
*hive.merge.size.per.task*=1000000 (or some other large number)

Alex K


On Wed, Jun 9, 2010 at 6:55 PM, wd <wd...@wdicc.com> wrote:

> I have lots of small files in hive, the mapred is too slow .... Is there a
> way to improve the speed ?
>
> 2010/6/10 Edward Capriolo <ed...@gmail.com>
>
>>
>>
>> On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:
>>
>>> I've tried hive 0.5, the option not work too.
>>> And find this page[
>>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>>> via google.
>>>
>>> 2010/6/9 wd <wd...@wdicc.com>
>>>
>>> hi,
>>>>
>>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>>
>>>> Is this a svn version problem?
>>>>
>>>
>>>
>> You answered your own question, look in the link
>>
>> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
>> "
>>
>> Map tasks is based on the number of input files and folders. Even though
>> hive uses a CombinedInput format you still can get a number of mappers.
>>
>> Edward
>>
>
>

Re: set mapred.map.tasks=1 not work

Posted by Edward Capriolo <ed...@gmail.com>.

On Wed, Jun 9, 2010 at 9:55 PM, wd <wd...@wdicc.com> wrote:

> I have lots of small files in hive, the mapred is too slow .... Is there a
> way to improve the speed ?
>
> 2010/6/10 Edward Capriolo <ed...@gmail.com>
>
>
>>
>> On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:
>>
>>> I've tried hive 0.5, the option not work too.
>>> And find this page[
>>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>>> via google.
>>>
>>> 2010/6/9 wd <wd...@wdicc.com>
>>>
>>> hi,
>>>>
>>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>>
>>>> Is this a svn version problem?
>>>>
>>>
>>>
>> You answered your own question, look in the link
>>
>> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
>> "
>>
>> Map tasks is based on the number of input files and folders. Even though
>> hive uses a CombinedInput format you still can get a number of mappers.
>>
>> Edward
>>
>
> With hadoop 20 and the Combine InputFormat you should get fairly decent
performance even with many small files. My current employer is about to open
source FileCrusher, a stand alone and map reduce application that merges
Text and Sequence files into one big one. So if you hang tight for a couple
days a can point you at a utility that might help.

Re: set mapred.map.tasks=1 not work

Posted by wd <wd...@wdicc.com>.

I have lots of small files in hive, the mapred is too slow .... Is there a
way to improve the speed ?

2010/6/10 Edward Capriolo <ed...@gmail.com>

>
>
> On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:
>
>> I've tried hive 0.5, the option not work too.
>> And find this page[
>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>> via google.
>>
>> 2010/6/9 wd <wd...@wdicc.com>
>>
>> hi,
>>>
>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>
>>> Is this a svn version problem?
>>>
>>
>>
> You answered your own question, look in the link
>
> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
> "
>
> Map tasks is based on the number of input files and folders. Even though
> hive uses a CombinedInput format you still can get a number of mappers.
>
> Edward
>

Re: set mapred.map.tasks=1 not work

Posted by Edward Capriolo <ed...@gmail.com>.

On Wed, Jun 9, 2010 at 3:04 AM, wd <wd...@wdicc.com> wrote:

> I've tried hive 0.5, the option not work too.
> And find this page[
> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
> via google.
>
> 2010/6/9 wd <wd...@wdicc.com>
>
> hi,
>>
>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>
>> Is this a svn version problem?
>>
>
>
You answered your own question, look in the link

"You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks. "

Map tasks is based on the number of input files and folders. Even though
hive uses a CombinedInput format you still can get a number of mappers.

Edward

Re: set mapred.map.tasks=1 not work

Posted by wd <wd...@wdicc.com>.

I've tried hive 0.5, the option not work too.
And find this page[
http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
via google.

2010/6/9 wd <wd...@wdicc.com>

> hi,
>
> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
> cli, but seemes it doesn't work, total map tasks still over 300+.
>
> Is this a svn version problem?
>