You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 王锋 <wf...@163.com> on 2012/09/19 16:25:54 UTC

size of RCFile in hive

Hi
   I tried to convert and merge many small text files using RCFiles using hivesql,but hive  produced some small rcfiles.
set hive.exec.compress.output=true;
set mapred.output.compress=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec;
hive.merge.mapfiles=true
hive.merge.mapredfiles=true
hive.merge.size.per.task=640000000
hive.merge.size.smallfiles.avgsize=80000000
insert  overwrite table rctable select .....


  the settings:
hive.merge.mapfiles=true
hive.merge.mapredfiles=true
hive.merge.size.per.task=640000000
hive.merge.size.smallfiles.avgsize=80000000
didn't work.


who could tell me how to solve it?

Re: Re: size of RCFile in hive

Posted by gemini alex <ge...@gmail.com>.
1. lower your mapper number,
2. Chen Song's suggestion is also work.
3. using shell command cat your small file into bigger one.


2012/9/27 Chen Song <ch...@gmail.com>

> You can force reduce phase by adding distribute by or order by clause
> after your select query.
>
> On Thu, Sep 27, 2012 at 2:03 PM, 王锋 <wf...@163.com> wrote:
>
>> but it's map only job
>>
>>
>> At 2012-09-27 05:39:39,"Chen Song" <ch...@gmail.com> wrote:
>>
>> As far as I know, the number of files emitted would be determined by the
>> number of mappers for a map only job and the number of reducers for a map
>> reduce job.
>>
>> So it totally depends how your query translates into a MR job.
>>
>> You can enforce it by setting the property
>>
>> *mapred.reduce.tasks=1*
>>
>> Chen
>>
>> On Wed, Sep 19, 2012 at 11:25 PM, 王锋 <wf...@163.com> wrote:
>>
>>> Hi
>>>    I tried to convert and merge many small text files using RCFiles
>>> using hivesql,but hive  produced some small rcfiles.
>>> set hive.exec.compress.output=true;
>>> set mapred.output.compress=true;
>>> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>>> set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec;
>>> hive.merge.mapfiles=true
>>> hive.merge.mapredfiles=true
>>> hive.merge.size.per.task=640000000
>>> hive.merge.size.smallfiles.avgsize=80000000
>>> insert  overwrite table rctable select .....
>>>
>>>
>>>   the settings:
>>> hive.merge.mapfiles=true
>>> hive.merge.mapredfiles=true
>>> hive.merge.size.per.task=640000000
>>> hive.merge.size.smallfiles.avgsize=80000000
>>> didn't work.
>>>
>>>
>>> who could tell me how to solve it?
>>
>>
>>
>>
>> --
>> Chen Song
>>
>>
>>
>>
>>
>
>
> --
> Chen Song
>
>
>

Re: Re: size of RCFile in hive

Posted by Chen Song <ch...@gmail.com>.
You can force reduce phase by adding distribute by or order by clause after
your select query.

On Thu, Sep 27, 2012 at 2:03 PM, 王锋 <wf...@163.com> wrote:

> but it's map only job
>
>
> At 2012-09-27 05:39:39,"Chen Song" <ch...@gmail.com> wrote:
>
> As far as I know, the number of files emitted would be determined by the
> number of mappers for a map only job and the number of reducers for a map
> reduce job.
>
> So it totally depends how your query translates into a MR job.
>
> You can enforce it by setting the property
>
> *mapred.reduce.tasks=1*
>
> Chen
>
> On Wed, Sep 19, 2012 at 11:25 PM, 王锋 <wf...@163.com> wrote:
>
>> Hi
>>    I tried to convert and merge many small text files using RCFiles using
>> hivesql,but hive  produced some small rcfiles.
>> set hive.exec.compress.output=true;
>> set mapred.output.compress=true;
>> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>> set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec;
>> hive.merge.mapfiles=true
>> hive.merge.mapredfiles=true
>> hive.merge.size.per.task=640000000
>> hive.merge.size.smallfiles.avgsize=80000000
>> insert  overwrite table rctable select .....
>>
>>
>>   the settings:
>> hive.merge.mapfiles=true
>> hive.merge.mapredfiles=true
>> hive.merge.size.per.task=640000000
>> hive.merge.size.smallfiles.avgsize=80000000
>> didn't work.
>>
>>
>> who could tell me how to solve it?
>
>
>
>
> --
> Chen Song
>
>
>
>
>


-- 
Chen Song

Re:Re: size of RCFile in hive

Posted by 王锋 <wf...@163.com>.
but it's map only job

At 2012-09-27 05:39:39,"Chen Song" <ch...@gmail.com> wrote:
As far as I know, the number of files emitted would be determined by the number of mappers for a map only job and the number of reducers for a map reduce job.


So it totally depends how your query translates into a MR job.


You can enforce it by setting the property


mapred.reduce.tasks=1


Chen


On Wed, Sep 19, 2012 at 11:25 PM, 王锋 <wf...@163.com> wrote:
Hi
   I tried to convert and merge many small text files using RCFiles using hivesql,but hive  produced some small rcfiles.
set hive.exec.compress.output=true;
set mapred.output.compress=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec;
hive.merge.mapfiles=true
hive.merge.mapredfiles=true
hive.merge.size.per.task=640000000
hive.merge.size.smallfiles.avgsize=80000000
insert  overwrite table rctable select .....


  the settings:
hive.merge.mapfiles=true
hive.merge.mapredfiles=true
hive.merge.size.per.task=640000000
hive.merge.size.smallfiles.avgsize=80000000
didn't work.


who could tell me how to solve it?





--
Chen Song





Re:Re: size of RCFile in hive

Posted by 王锋 <wf...@163.com>.
but it's map only job

At 2012-09-27 05:39:39,"Chen Song" <ch...@gmail.com> wrote:
As far as I know, the number of files emitted would be determined by the number of mappers for a map only job and the number of reducers for a map reduce job.


So it totally depends how your query translates into a MR job.


You can enforce it by setting the property


mapred.reduce.tasks=1


Chen


On Wed, Sep 19, 2012 at 11:25 PM, 王锋 <wf...@163.com> wrote:
Hi
   I tried to convert and merge many small text files using RCFiles using hivesql,but hive  produced some small rcfiles.
set hive.exec.compress.output=true;
set mapred.output.compress=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec;
hive.merge.mapfiles=true
hive.merge.mapredfiles=true
hive.merge.size.per.task=640000000
hive.merge.size.smallfiles.avgsize=80000000
insert  overwrite table rctable select .....


  the settings:
hive.merge.mapfiles=true
hive.merge.mapredfiles=true
hive.merge.size.per.task=640000000
hive.merge.size.smallfiles.avgsize=80000000
didn't work.


who could tell me how to solve it?





--
Chen Song





Re: size of RCFile in hive

Posted by Chen Song <ch...@gmail.com>.
As far as I know, the number of files emitted would be determined by the
number of mappers for a map only job and the number of reducers for a map
reduce job.

So it totally depends how your query translates into a MR job.

You can enforce it by setting the property

*mapred.reduce.tasks=1*

Chen

On Wed, Sep 19, 2012 at 11:25 PM, 王锋 <wf...@163.com> wrote:

> Hi
>    I tried to convert and merge many small text files using RCFiles using
> hivesql,but hive  produced some small rcfiles.
> set hive.exec.compress.output=true;
> set mapred.output.compress=true;
> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec;
> hive.merge.mapfiles=true
> hive.merge.mapredfiles=true
> hive.merge.size.per.task=640000000
> hive.merge.size.smallfiles.avgsize=80000000
> insert  overwrite table rctable select .....
>
>
>   the settings:
> hive.merge.mapfiles=true
> hive.merge.mapredfiles=true
> hive.merge.size.per.task=640000000
> hive.merge.size.smallfiles.avgsize=80000000
> didn't work.
>
>
> who could tell me how to solve it?




-- 
Chen Song

Re: size of RCFile in hive

Posted by Chen Song <ch...@gmail.com>.
As far as I know, the number of files emitted would be determined by the
number of mappers for a map only job and the number of reducers for a map
reduce job.

So it totally depends how your query translates into a MR job.

You can enforce it by setting the property

*mapred.reduce.tasks=1*

Chen

On Wed, Sep 19, 2012 at 11:25 PM, 王锋 <wf...@163.com> wrote:

> Hi
>    I tried to convert and merge many small text files using RCFiles using
> hivesql,but hive  produced some small rcfiles.
> set hive.exec.compress.output=true;
> set mapred.output.compress=true;
> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec;
> hive.merge.mapfiles=true
> hive.merge.mapredfiles=true
> hive.merge.size.per.task=640000000
> hive.merge.size.smallfiles.avgsize=80000000
> insert  overwrite table rctable select .....
>
>
>   the settings:
> hive.merge.mapfiles=true
> hive.merge.mapredfiles=true
> hive.merge.size.per.task=640000000
> hive.merge.size.smallfiles.avgsize=80000000
> didn't work.
>
>
> who could tell me how to solve it?




-- 
Chen Song