You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Suhail Doshi <di...@gmail.com> on 2009/04/25 18:40:02 UTC

Truncate Table

Hello,

Is there a way to truncate a table instead of dropping it and the creating
it again? I've been looking through the docs and haven't found anything.

Suhail

-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com

Re: Truncate Table

Posted by Frederick Oko <fr...@gmail.com>.
I had commented a while back on the trust-HDFS removal in HIVE-126 as at hi5
we had always been writing to HDFS directly instead of via Hive CLI load. We
agreed a config to reenable this behavior would be good so as to not require
coupling of Hive into the data load process tho I agree HIVE-91 external
table support et al was a nice add. Not sure if that amounts to great
demand.

As an extension of this philosophy as Hive as just a processing overlay it
was preferable to distinguish schema over data-mod operations e.g. a DROP
TABLE that does not touch the data w/o something like 'INCLUDING DATA'
whereas a TRUNCATE is a clear data-mod request.

On Apr 28, 2009 10:50 AM, "Prasad Chakka" <pc...@facebook.com> wrote:

 By having partition level metadata and not deriving it, quite a few things
can be achieved


   - partitions can be external (i.e. Location of a partitions need not be
   relative to the table)
   - partitions level schemas so that a table’s schema can evolve
   - partitions can be disabled without deleting the actual data


So that change was made to facilitate above things. Without that it will be
difficult to achieve this.

Instead of doing hadoop dfs –rmr <partition location>, you can use ‘alter
table <tbl_name> drop partition (<parition key value>)’ and ‘load data ...
Into <tabl> <part>’ to load data. I suppose we can give option to truncate
partition, but for now dropping a partition or truncating it doesn’t have
much difference though.

But if there is a great demand for automatically inferring partitions, we
can put the old code back in but enable it only with a config param.

Thanks,
Prasad
------------------------------
*From: *Frederick Oko <fr...@gmail.com>
*Reply-To: *<hi...@hadoop.apache.org>
*Date: *Sun, 26 Apr 2009 16:10:29 -0700

To: <hi...@hadoop.apache.org>
*Subject: *Re: RE: Truncate Table

Keep in mind that if your use case is to guarantee an empty table to load
into then every 'insert ...

> On Apr 26, 2009 12:34 AM, "Ashish Thusoo" <at...@facebook.com> wrote: >
> Suhail, > > Hive does ...

Re: Truncate Table

Posted by Prasad Chakka <pc...@facebook.com>.
By having partition level metadata and not deriving it, quite a few things can be achieved


 *   partitions can be external (i.e. Location of a partitions need not be relative to the table)
 *   partitions level schemas so that a table's schema can evolve
 *   partitions can be disabled without deleting the actual data

So that change was made to facilitate above things. Without that it will be difficult to achieve this.

Instead of doing hadoop dfs -rmr <partition location>, you can use 'alter table <tbl_name> drop partition (<parition key value>)' and 'load data ... Into <tabl> <part>' to load data. I suppose we can give option to truncate partition, but for now dropping a partition or truncating it doesn't have much difference though.

But if there is a great demand for automatically inferring partitions, we can put the old code back in but enable it only with a config param.

Thanks,
Prasad
________________________________
From: Frederick Oko <fr...@gmail.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Sun, 26 Apr 2009 16:10:29 -0700
To: <hi...@hadoop.apache.org>
Subject: Re: RE: Truncate Table

Keep in mind that if your use case is to guarantee an empty table to load into then every 'insert overwrite table' does just that for you (it clobbers vs insert into). This and the workaround (which must be table specific) is complicated when you have partitions. If your usecase is reclaiming space, just data invalidation, or have partitions simply deleting the base data via hadoop -rmr /user/hive/warehouse/<table>/* (assuming default data mgmt) is decoupled but works and should not be harmful -- Hive is supposed to be just an overlay but there was a change to partition mgmt that I would like to hear whether people think raw delete is now undesirable.

On Apr 26, 2009 12:34 AM, "Ashish Thusoo" <at...@facebook.com> wrote:

Suhail,

Hive does not support truncate yet. I think Prasad already opened a JIRA for this

https://issues.apache.org/jira/browse/HIVE-446

Matt,

I must say that this is a very creative workaround :)

Ashish
________________________________________
From: Matt Pestritto [matt@pestritto.com]
Sent: Saturday, April 25, 2009 10:08 AM
To: hive-user@hadoop.apache.org
Subject: Re: Truncate Table

I've used the following: Insert overwrite tbl select * from tbl where col1 = 'fake value' On 4/25/...


Re: RE: Truncate Table

Posted by Frederick Oko <fr...@gmail.com>.
Keep in mind that if your use case is to guarantee an empty table to load
into then every 'insert overwrite table' does just that for you (it clobbers
vs insert into). This and the workaround (which must be table specific) is
complicated when you have partitions. If your usecase is reclaiming space,
just data invalidation, or have partitions simply deleting the base data via
hadoop -rmr /user/hive/warehouse/<table>/* (assuming default data mgmt) is
decoupled but works and should not be harmful -- Hive is supposed to be just
an overlay but there was a change to partition mgmt that I would like to
hear whether people think raw delete is now undesirable.

On Apr 26, 2009 12:34 AM, "Ashish Thusoo" <at...@facebook.com> wrote:

Suhail,

Hive does not support truncate yet. I think Prasad already opened a JIRA for
this

https://issues.apache.org/jira/browse/HIVE-446

Matt,

I must say that this is a very creative workaround :)

Ashish
________________________________________
From: Matt Pestritto [matt@pestritto.com]
Sent: Saturday, April 25, 2009 10:08 AM
To: hive-user@hadoop.apache.org
Subject: Re: Truncate Table

I've used the following: Insert overwrite tbl select * from tbl where col1 =
'fake value' On 4/25/...

RE: Truncate Table

Posted by Ashish Thusoo <at...@facebook.com>.
Suhail, 

Hive does not support truncate yet. I think Prasad already opened a JIRA for this

https://issues.apache.org/jira/browse/HIVE-446

Matt,

I must say that this is a very creative workaround :)

Ashish
________________________________________
From: Matt Pestritto [matt@pestritto.com]
Sent: Saturday, April 25, 2009 10:08 AM
To: hive-user@hadoop.apache.org
Subject: Re: Truncate Table

I've used the following:
Insert overwrite tbl select * from tbl where col1 = 'fake value'

On 4/25/09, Suhail Doshi <di...@gmail.com> wrote:
> Hello,
>
> Is there a way to truncate a table instead of dropping it and the creating
> it again? I've been looking through the docs and haven't found anything.
>
> Suhail
>
> --
> http://mixpanel.com
> Blog: http://blog.mixpanel.com
>

Re: Truncate Table

Posted by Matt Pestritto <ma...@pestritto.com>.
I've used the following:
Insert overwrite tbl select * from tbl where col1 = 'fake value'

On 4/25/09, Suhail Doshi <di...@gmail.com> wrote:
> Hello,
>
> Is there a way to truncate a table instead of dropping it and the creating
> it again? I've been looking through the docs and haven't found anything.
>
> Suhail
>
> --
> http://mixpanel.com
> Blog: http://blog.mixpanel.com
>