You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by gaurav jain <ja...@yahoo.com> on 2010/10/06 05:43:38 UTC

FileFormat

Hi,

insert overwrite directory "$dir" select * from xxx;

creates files of type attempt_201008201925_165088_r_000000_0.gz




insert overwrite table "$table" select * from xxx;

creates file of type attempt_201008201925_165088_r_000000_0



How can I configure "insert overwrite directory" to producesequence files ( non 
.gz )





Regards,
Gaurav Jain


      

Re: How to output SeqFile

Posted by gaurav jain <ja...@yahoo.com>.
I tried your suggestions with config 

set hive.query.result.fileformat=sequencefile;

and then ( separately)

set hive.default.fileformat=sequencefile;


It does not work

As per docs, I think these options are only applied for CREATE TABLE query ??


any other suggestion will be helpful.
gaurav jain



----- Original Message ----
From: yongqiang he <he...@gmail.com>
To: hive-dev@hadoop.apache.org
Sent: Wed, October 6, 2010 6:34:53 PM
Subject: Re: How to output SeqFile

can you try
set hive.query.result.fileformat=sequencefile;

if not work, you can also try
set hive.default.fileformat=sequencefile;

thanks
yongqiang
On Wed, Oct 6, 2010 at 2:29 PM, gaurav jain <ja...@yahoo.com> wrote:
>
>
> Thanks Yang. I thought about it as well. But as you said, its a hack.
>
> hive-dev@, can you please verify if this is possible?
>
>
>
> ----- Original Message ----
> From: Yang <te...@gmail.com>
> To: hive-user@hadoop.apache.org
> Sent: Wed, October 6, 2010 1:52:21 PM
> Subject: Re: How to output SeqFile
>
> if this is indeed a feature that is yet missing, I have a hack:
>
> create a temp table that is seqFile format, then you dump to that table,
> then since you know the location, just copy the part files from that location.
> then delete that partition from the table manually. of course you may
> run into some issues
> such as "partition already exists" when you insert into the temp table
> the next time, so you may need
> to do an explicit delete from the temp table too.
>
> Y
>
> On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain <ja...@yahoo.com> wrote:
>> I was hoping there would be a configuration where I can set the outputformat
>>for
>> my query.
>>
>> Regards,
>> Gaurav Jain
>>
>>
>>
>> ----- Original Message ----
>> From: Jacob R Rideout <ap...@jacobrideout.net>
>> To: hive-user@hadoop.apache.org
>> Sent: Wed, October 6, 2010 1:42:57 PM
>> Subject: Re: How to output SeqFile
>>
>> On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain <ja...@yahoo.com> wrote:
>>> I do have that.
>>>
>>> However I am not writing directly to the table partition. Instead, I first
>>>write
>>> my data in a tmp directory (eventually moved to the hdfs table partition)
>  and
>>> then publish that partition using alter table statement in metastore.
>>>
>>> Something like this:
>>>
>>> -- create table x ... stored as SeqFile
>>> -- insert overwrite directory 'd' select * from table y
>>> -- distcp 'd'  x/dateint=.../hour=...
>>> -- alter table x add partition ....
>>>
>>> In the second step above I need to produce SeqFile.
>>>
>>>
>>> Thanks for prompt reply.
>>> Gaurav Jain
>>>
>>>
>>> ----- Original Message ----
>>> From: Yang <te...@gmail.com>
>>> To: jainy_gaurav@yahoo.com
>>> Sent: Wed, October 6, 2010 1:28:42 PM
>>> Subject: Re: How to output SeqFile
>>>
>>> Gaurav:
>>>
>>> not sure if I understand your question correctly....
>>> when you create the output table, that has an option to set the
>>> output table SerDe
>>>
>>> Regards
>>> Yang
>>>
>>> On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> How can I produce a sequence file from query
>>>>
>>>> insert overwrite directory ....
>>>>
>>>>
>>>> I have set:
>>>>
>>>> SET io.seqfile.compression.type=BLOCK;
>>>> SET hive.exec.compress.output=true;
>>>> set mapred.output.compression.type=BLOCK;
>>>> set 
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>>>
>>>>
>>>>
>>>> It seems to produce Text .gz format files.
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Gaurav Jain
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> if you are inserting into the directory rather than the table, hive
>> won't know to look at the metadata description of the table
>>
>> you need something like:
>> insert overwrite table x select * from table y
>>
>>
>>
>>
>>
>
>
>
>
>



      

Re: How to output SeqFile

Posted by yongqiang he <he...@gmail.com>.
can you try
set hive.query.result.fileformat=sequencefile;

if not work, you can also try
set hive.default.fileformat=sequencefile;

thanks
yongqiang
On Wed, Oct 6, 2010 at 2:29 PM, gaurav jain <ja...@yahoo.com> wrote:
>
>
> Thanks Yang. I thought about it as well. But as you said, its a hack.
>
> hive-dev@, can you please verify if this is possible?
>
>
>
> ----- Original Message ----
> From: Yang <te...@gmail.com>
> To: hive-user@hadoop.apache.org
> Sent: Wed, October 6, 2010 1:52:21 PM
> Subject: Re: How to output SeqFile
>
> if this is indeed a feature that is yet missing, I have a hack:
>
> create a temp table that is seqFile format, then you dump to that table,
> then since you know the location, just copy the part files from that location.
> then delete that partition from the table manually. of course you may
> run into some issues
> such as "partition already exists" when you insert into the temp table
> the next time, so you may need
> to do an explicit delete from the temp table too.
>
> Y
>
> On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain <ja...@yahoo.com> wrote:
>> I was hoping there would be a configuration where I can set the outputformat
>>for
>> my query.
>>
>> Regards,
>> Gaurav Jain
>>
>>
>>
>> ----- Original Message ----
>> From: Jacob R Rideout <ap...@jacobrideout.net>
>> To: hive-user@hadoop.apache.org
>> Sent: Wed, October 6, 2010 1:42:57 PM
>> Subject: Re: How to output SeqFile
>>
>> On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain <ja...@yahoo.com> wrote:
>>> I do have that.
>>>
>>> However I am not writing directly to the table partition. Instead, I first
>>>write
>>> my data in a tmp directory (eventually moved to the hdfs table partition)
>  and
>>> then publish that partition using alter table statement in metastore.
>>>
>>> Something like this:
>>>
>>> -- create table x ... stored as SeqFile
>>> -- insert overwrite directory 'd' select * from table y
>>> -- distcp 'd'  x/dateint=.../hour=...
>>> -- alter table x add partition ....
>>>
>>> In the second step above I need to produce SeqFile.
>>>
>>>
>>> Thanks for prompt reply.
>>> Gaurav Jain
>>>
>>>
>>> ----- Original Message ----
>>> From: Yang <te...@gmail.com>
>>> To: jainy_gaurav@yahoo.com
>>> Sent: Wed, October 6, 2010 1:28:42 PM
>>> Subject: Re: How to output SeqFile
>>>
>>> Gaurav:
>>>
>>> not sure if I understand your question correctly....
>>> when you create the output table, that has an option to set the
>>> output table SerDe
>>>
>>> Regards
>>> Yang
>>>
>>> On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> How can I produce a sequence file from query
>>>>
>>>> insert overwrite directory ....
>>>>
>>>>
>>>> I have set:
>>>>
>>>> SET io.seqfile.compression.type=BLOCK;
>>>> SET hive.exec.compress.output=true;
>>>> set mapred.output.compression.type=BLOCK;
>>>> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>>>
>>>>
>>>>
>>>> It seems to produce Text .gz format files.
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Gaurav Jain
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> if you are inserting into the directory rather than the table, hive
>> won't know to look at the metadata description of the table
>>
>> you need something like:
>> insert overwrite table x select * from table y
>>
>>
>>
>>
>>
>
>
>
>
>

Re: How to output SeqFile

Posted by gaurav jain <ja...@yahoo.com>.

Thanks Yang. I thought about it as well. But as you said, its a hack.

hive-dev@, can you please verify if this is possible?



----- Original Message ----
From: Yang <te...@gmail.com>
To: hive-user@hadoop.apache.org
Sent: Wed, October 6, 2010 1:52:21 PM
Subject: Re: How to output SeqFile

if this is indeed a feature that is yet missing, I have a hack:

create a temp table that is seqFile format, then you dump to that table,
then since you know the location, just copy the part files from that location.
then delete that partition from the table manually. of course you may
run into some issues
such as "partition already exists" when you insert into the temp table
the next time, so you may need
to do an explicit delete from the temp table too.

Y

On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain <ja...@yahoo.com> wrote:
> I was hoping there would be a configuration where I can set the outputformat 
>for
> my query.
>
> Regards,
> Gaurav Jain
>
>
>
> ----- Original Message ----
> From: Jacob R Rideout <ap...@jacobrideout.net>
> To: hive-user@hadoop.apache.org
> Sent: Wed, October 6, 2010 1:42:57 PM
> Subject: Re: How to output SeqFile
>
> On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain <ja...@yahoo.com> wrote:
>> I do have that.
>>
>> However I am not writing directly to the table partition. Instead, I first
>>write
>> my data in a tmp directory (eventually moved to the hdfs table partition) 
 and
>> then publish that partition using alter table statement in metastore.
>>
>> Something like this:
>>
>> -- create table x ... stored as SeqFile
>> -- insert overwrite directory 'd' select * from table y
>> -- distcp 'd'  x/dateint=.../hour=...
>> -- alter table x add partition ....
>>
>> In the second step above I need to produce SeqFile.
>>
>>
>> Thanks for prompt reply.
>> Gaurav Jain
>>
>>
>> ----- Original Message ----
>> From: Yang <te...@gmail.com>
>> To: jainy_gaurav@yahoo.com
>> Sent: Wed, October 6, 2010 1:28:42 PM
>> Subject: Re: How to output SeqFile
>>
>> Gaurav:
>>
>> not sure if I understand your question correctly....
>> when you create the output table, that has an option to set the
>> output table SerDe
>>
>> Regards
>> Yang
>>
>> On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>>>
>>>
>>>
>>>
>>> How can I produce a sequence file from query
>>>
>>> insert overwrite directory ....
>>>
>>>
>>> I have set:
>>>
>>> SET io.seqfile.compression.type=BLOCK;
>>> SET hive.exec.compress.output=true;
>>> set mapred.output.compression.type=BLOCK;
>>> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>>
>>>
>>>
>>> It seems to produce Text .gz format files.
>>>
>>>
>>>
>>> Regards,
>>> Gaurav Jain
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>
> if you are inserting into the directory rather than the table, hive
> won't know to look at the metadata description of the table
>
> you need something like:
> insert overwrite table x select * from table y
>
>
>
>
>



      

Re: How to output SeqFile

Posted by Yang <te...@gmail.com>.
if this is indeed a feature that is yet missing, I have a hack:

create a temp table that is seqFile format, then you dump to that table,
then since you know the location, just copy the part files from that location.
then delete that partition from the table manually. of course you may
run into some issues
such as "partition already exists" when you insert into the temp table
the next time, so you may need
to do an explicit delete from the temp table too.

Y

On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain <ja...@yahoo.com> wrote:
> I was hoping there would be a configuration where I can set the outputformat for
> my query.
>
> Regards,
> Gaurav Jain
>
>
>
> ----- Original Message ----
> From: Jacob R Rideout <ap...@jacobrideout.net>
> To: hive-user@hadoop.apache.org
> Sent: Wed, October 6, 2010 1:42:57 PM
> Subject: Re: How to output SeqFile
>
> On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain <ja...@yahoo.com> wrote:
>> I do have that.
>>
>> However I am not writing directly to the table partition. Instead, I first
>>write
>> my data in a tmp directory (eventually moved to the hdfs table partition)  and
>> then publish that partition using alter table statement in metastore.
>>
>> Something like this:
>>
>> -- create table x ... stored as SeqFile
>> -- insert overwrite directory 'd' select * from table y
>> -- distcp 'd'  x/dateint=.../hour=...
>> -- alter table x add partition ....
>>
>> In the second step above I need to produce SeqFile.
>>
>>
>> Thanks for prompt reply.
>> Gaurav Jain
>>
>>
>> ----- Original Message ----
>> From: Yang <te...@gmail.com>
>> To: jainy_gaurav@yahoo.com
>> Sent: Wed, October 6, 2010 1:28:42 PM
>> Subject: Re: How to output SeqFile
>>
>> Gaurav:
>>
>> not sure if I understand your question correctly....
>> when you create the output table, that has an option to set the
>> output table SerDe
>>
>> Regards
>> Yang
>>
>> On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>>>
>>>
>>>
>>>
>>> How can I produce a sequence file from query
>>>
>>> insert overwrite directory ....
>>>
>>>
>>> I have set:
>>>
>>> SET io.seqfile.compression.type=BLOCK;
>>> SET hive.exec.compress.output=true;
>>> set mapred.output.compression.type=BLOCK;
>>> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>>
>>>
>>>
>>> It seems to produce Text .gz format files.
>>>
>>>
>>>
>>> Regards,
>>> Gaurav Jain
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>
> if you are inserting into the directory rather than the table, hive
> won't know to look at the metadata description of the table
>
> you need something like:
> insert overwrite table x select * from table y
>
>
>
>
>

Re: How to output SeqFile

Posted by gaurav jain <ja...@yahoo.com>.
I was hoping there would be a configuration where I can set the outputformat for 
my query.

Regards,
Gaurav Jain



----- Original Message ----
From: Jacob R Rideout <ap...@jacobrideout.net>
To: hive-user@hadoop.apache.org
Sent: Wed, October 6, 2010 1:42:57 PM
Subject: Re: How to output SeqFile

On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain <ja...@yahoo.com> wrote:
> I do have that.
>
> However I am not writing directly to the table partition. Instead, I first 
>write
> my data in a tmp directory (eventually moved to the hdfs table partition)  and
> then publish that partition using alter table statement in metastore.
>
> Something like this:
>
> -- create table x ... stored as SeqFile
> -- insert overwrite directory 'd' select * from table y
> -- distcp 'd'  x/dateint=.../hour=...
> -- alter table x add partition ....
>
> In the second step above I need to produce SeqFile.
>
>
> Thanks for prompt reply.
> Gaurav Jain
>
>
> ----- Original Message ----
> From: Yang <te...@gmail.com>
> To: jainy_gaurav@yahoo.com
> Sent: Wed, October 6, 2010 1:28:42 PM
> Subject: Re: How to output SeqFile
>
> Gaurav:
>
> not sure if I understand your question correctly....
> when you create the output table, that has an option to set the
> output table SerDe
>
> Regards
> Yang
>
> On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>>
>>
>>
>>
>> How can I produce a sequence file from query
>>
>> insert overwrite directory ....
>>
>>
>> I have set:
>>
>> SET io.seqfile.compression.type=BLOCK;
>> SET hive.exec.compress.output=true;
>> set mapred.output.compression.type=BLOCK;
>> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>
>>
>>
>> It seems to produce Text .gz format files.
>>
>>
>>
>> Regards,
>> Gaurav Jain
>>
>>
>>
>>
>
>
>
>
>


if you are inserting into the directory rather than the table, hive
won't know to look at the metadata description of the table

you need something like:
insert overwrite table x select * from table y



      

Re: How to output SeqFile

Posted by Jacob R Rideout <ap...@jacobrideout.net>.
On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain <ja...@yahoo.com> wrote:
> I do have that.
>
> However I am not writing directly to the table partition. Instead, I first write
> my data in a tmp directory (eventually moved to the hdfs table partition)  and
> then publish that partition using alter table statement in metastore.
>
> Something like this:
>
> -- create table x ... stored as SeqFile
> -- insert overwrite directory 'd' select * from table y
> -- distcp 'd'  x/dateint=.../hour=...
> -- alter table x add partition ....
>
> In the second step above I need to produce SeqFile.
>
>
> Thanks for prompt reply.
> Gaurav Jain
>
>
> ----- Original Message ----
> From: Yang <te...@gmail.com>
> To: jainy_gaurav@yahoo.com
> Sent: Wed, October 6, 2010 1:28:42 PM
> Subject: Re: How to output SeqFile
>
> Gaurav:
>
> not sure if I understand your question correctly....
> when you create the output table, that has an option to set the
> output table SerDe
>
> Regards
> Yang
>
> On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>>
>>
>>
>>
>> How can I produce a sequence file from query
>>
>> insert overwrite directory ....
>>
>>
>> I have set:
>>
>> SET io.seqfile.compression.type=BLOCK;
>> SET hive.exec.compress.output=true;
>> set mapred.output.compression.type=BLOCK;
>> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>
>>
>>
>> It seems to produce Text .gz format files.
>>
>>
>>
>> Regards,
>> Gaurav Jain
>>
>>
>>
>>
>
>
>
>
>


if you are inserting into the directory rather than the table, hive
won't know to look at the metadata description of the table

you need something like:
insert overwrite table x select * from table y

Re: How to output SeqFile

Posted by gaurav jain <ja...@yahoo.com>.
I do have that.

However I am not writing directly to the table partition. Instead, I first write 
my data in a tmp directory (eventually moved to the hdfs table partition)  and 
then publish that partition using alter table statement in metastore. 

Something like this:

-- create table x ... stored as SeqFile
-- insert overwrite directory 'd' select * from table y
-- distcp 'd'  x/dateint=.../hour=...
-- alter table x add partition ....

In the second step above I need to produce SeqFile.


Thanks for prompt reply.
Gaurav Jain


----- Original Message ----
From: Yang <te...@gmail.com>
To: jainy_gaurav@yahoo.com
Sent: Wed, October 6, 2010 1:28:42 PM
Subject: Re: How to output SeqFile

Gaurav:

not sure if I understand your question correctly....
when you create the output table, that has an option to set the
output table SerDe

Regards
Yang

On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>
>
>
>
> How can I produce a sequence file from query
>
> insert overwrite directory ....
>
>
> I have set:
>
> SET io.seqfile.compression.type=BLOCK;
> SET hive.exec.compress.output=true;
> set mapred.output.compression.type=BLOCK;
> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
>
>
> It seems to produce Text .gz format files.
>
>
>
> Regards,
> Gaurav Jain
>
>
>
>



      

Re: How to output SeqFile

Posted by gaurav jain <ja...@yahoo.com>.
I do have that.

However I am not writing directly to the table partition. Instead, I first write 
my data in a tmp directory (eventually moved to the hdfs table partition)  and 
then publish that partition using alter table statement in metastore. 

Something like this:

-- create table x ... stored as SeqFile
-- insert overwrite directory 'd' select * from table y
-- distcp 'd'  x/dateint=.../hour=...
-- alter table x add partition ....

In the second step above I need to produce SeqFile.


Thanks for prompt reply.
Gaurav Jain


----- Original Message ----
From: Yang <te...@gmail.com>
To: jainy_gaurav@yahoo.com
Sent: Wed, October 6, 2010 1:28:42 PM
Subject: Re: How to output SeqFile

Gaurav:

not sure if I understand your question correctly....
when you create the output table, that has an option to set the
output table SerDe

Regards
Yang

On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain <ja...@yahoo.com> wrote:
>
>
>
>
> How can I produce a sequence file from query
>
> insert overwrite directory ....
>
>
> I have set:
>
> SET io.seqfile.compression.type=BLOCK;
> SET hive.exec.compress.output=true;
> set mapred.output.compression.type=BLOCK;
> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
>
>
> It seems to produce Text .gz format files.
>
>
>
> Regards,
> Gaurav Jain
>
>
>
>



      

How to output SeqFile

Posted by gaurav jain <ja...@yahoo.com>.



How can I produce a sequence file from query 

insert overwrite directory ....


I have set:

SET io.seqfile.compression.type=BLOCK;
SET hive.exec.compress.output=true;
set mapred.output.compression.type=BLOCK;  
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



It seems to produce Text .gz format files.



Regards,
Gaurav Jain


      

How to output SeqFile

Posted by gaurav jain <ja...@yahoo.com>.



How can I produce a sequence file from query 

insert overwrite directory ....


I have set:

SET io.seqfile.compression.type=BLOCK;
SET hive.exec.compress.output=true;
set mapred.output.compression.type=BLOCK;  
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



It seems to produce Text .gz format files.



Regards,
Gaurav Jain


      

FileFormat

Posted by gaurav jain <ja...@yahoo.com>.

Hi,

insert overwrite directory "$dir" select * from xxx;

creates files of type attempt_201008201925_165088_r_000000_0.gz




insert overwrite table "$table" select * from xxx;

creates file of type attempt_201008201925_165088_r_000000_0



How can I configure "insert overwrite directory" to producesequence files ( non 
.gz )





Regards,
Gaurav Jain


      

FileFormat

Posted by gaurav jain <ja...@yahoo.com>.

Hi,

insert overwrite directory "$dir" select * from xxx;

creates files of type attempt_201008201925_165088_r_000000_0.gz




insert overwrite table "$table" select * from xxx;

creates file of type attempt_201008201925_165088_r_000000_0



How can I configure "insert overwrite directory" to producesequence files ( non 
.gz )





Regards,
Gaurav Jain