You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jov <zh...@gmail.com> on 2011/03/30 07:22:32 UTC

Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

try add limit:

INSERT OVERWRITE LOCAL DIRECTORY
'/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
Select host, identity, user, time, request
from raw_apachelog
where ds = '2011-03-22-001500' limit 32;


2011/3/30 V.Senthil Kumar <va...@yahoo.com>:
> Hello,
>
> I have a hive query which does a simple select and writes the results to a local
>
> file system.
>
>
> For example, a query like this,
>
> INSERT OVERWRITE LOCAL DIRECTORY
> '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
> Select host, identity, user, time, request
> from raw_apachelog
> where ds = '2011-03-22-001500';
>
> Now this creates a two files under apachetest folder. This table has only 32
> rows. Is there any way I can make Hive to create only single file ?
>
>
> Appreciate your help :)
>
> Thanks,
> Senthil
>

Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Mar 30, 2011 at 3:31 PM, V.Senthil Kumar <va...@yahoo.com> wrote:
> Thanks for the suggestion. The query created just one result file.
>
> Also, before trying this query, I have found out another way of making this
> work. I have added the following properties in hive-site.xml and it worked as
> well. It created just one result file.
>
>
> <property>
>  <name>hive.merge.mapredfiles</name>
>  <value>true</value>
>  <description>Merge small files at the end of a map-reduce job</description>
> </property>
>
> <property>
>  <name>hive.input.format</name>
>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>  <description>The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always
> overwrite it - if there is a bug in CombineHiveInputFormat, it can always be
> manually set to HiveInputFormat. </description>
> </property>
>
>
>
> ----- Original Message ----
> From: Jov <zh...@gmail.com>
> To: user@hive.apache.org
> Sent: Tue, March 29, 2011 10:22:32 PM
> Subject: Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files
>
> try add limit:
>
> INSERT OVERWRITE LOCAL DIRECTORY
> '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
> Select host, identity, user, time, request
> from raw_apachelog
> where ds = '2011-03-22-001500' limit 32;
>
>
> 2011/3/30 V.Senthil Kumar <va...@yahoo.com>:
>> Hello,
>>
>> I have a hive query which does a simple select and writes the results to a
>>local
>>
>> file system.
>>
>>
>> For example, a query like this,
>>
>> INSERT OVERWRITE LOCAL DIRECTORY
>> '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
>> Select host, identity, user, time, request
>> from raw_apachelog
>> where ds = '2011-03-22-001500';
>>
>> Now this creates a two files under apachetest folder. This table has only 32
>> rows. Is there any way I can make Hive to create only single file ?
>>
>>
>> Appreciate your help :)
>>
>> Thanks,
>> Senthil
>>
>
>

The number of files is a result of the number of reducers used in the
job. Adding a limit adds a single reducer phase to the job end. You
should be able to accomplish the same thing with 'set
mapred.reduce.tasks=1'

Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

Posted by "V.Senthil Kumar" <va...@yahoo.com>.
Thanks for the suggestion. The query created just one result file.  

Also, before trying this query, I have found out another way of making this 
work. I have added the following properties in hive-site.xml and it worked as 
well. It created just one result file. 


<property>
  <name>hive.merge.mapredfiles</name>
  <value>true</value>
  <description>Merge small files at the end of a map-reduce job</description>
</property>

<property>
  <name>hive.input.format</name>
  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
  <description>The default input format, if it is not specified, the system 
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, 
whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always 
overwrite it - if there is a bug in CombineHiveInputFormat, it can always be 
manually set to HiveInputFormat. </description>
</property>



----- Original Message ----
From: Jov <zh...@gmail.com>
To: user@hive.apache.org
Sent: Tue, March 29, 2011 10:22:32 PM
Subject: Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

try add limit:

INSERT OVERWRITE LOCAL DIRECTORY
'/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
Select host, identity, user, time, request
from raw_apachelog
where ds = '2011-03-22-001500' limit 32;


2011/3/30 V.Senthil Kumar <va...@yahoo.com>:
> Hello,
>
> I have a hive query which does a simple select and writes the results to a 
>local
>
> file system.
>
>
> For example, a query like this,
>
> INSERT OVERWRITE LOCAL DIRECTORY
> '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
> Select host, identity, user, time, request
> from raw_apachelog
> where ds = '2011-03-22-001500';
>
> Now this creates a two files under apachetest folder. This table has only 32
> rows. Is there any way I can make Hive to create only single file ?
>
>
> Appreciate your help :)
>
> Thanks,
> Senthil
>