You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Richard A. Bross" <rb...@oaktreepeak.com> on 2018/04/05 13:14:52 UTC

ALTER TABLE DROP PARTITION not working on S3

Hi,

I have a Hive managed table on S3, "api_measurements".  I've tried dropping a partition like so:

hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500);
Dropped the partition daydate=2018-04-04/epoch=1522876500
OK
Time taken: 2.109 seconds

Yet the data is still on S3.  Because object keys on S3 are always strings, I also tried this:

hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500');
OK
Time taken: 0.135 seconds

Yet the object keys and data are still there.  I assume that Im missing something really simple.  Can anyone shed some light on this?

Thanks


Re: ALTER TABLE DROP PARTITION not working on S3

Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
No definitely internal.  It's loaded from an external table.

----- Original Message -----
From: "Furcy Pin" <pi...@gmail.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:21:06 AM
Subject: Re: ALTER TABLE DROP PARTITION not working on S3


Hi Richard, 


could you please check if your table is EXTERNAL? 
You can see it with a "DESCRIBE FORMATTED table_name ;" 


That's what external tables are for, they don't delete underlying data when you drop them. 




On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote: 


I think that someone put a file in there manually. Would that prevent Hive from dropping the partition. I also did a "drop table" and the s3 object keys persisted. 



----- Original Message ----- 
From: "Richard A. Bross" < rb@oaktreepeak.com > 
To: user@hive.apache.org 
Sent: Thursday, April 5, 2018 9:14:52 AM 
Subject: ALTER TABLE DROP PARTITION not working on S3 

Hi, 

I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so: 

hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500); 
Dropped the partition daydate=2018-04-04/epoch=1522876500 
OK 
Time taken: 2.109 seconds 

Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this: 

hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500'); 
OK 
Time taken: 0.135 seconds 

Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this? 

Thanks 



Re: ALTER TABLE DROP PARTITION not working on S3

Posted by Furcy Pin <pi...@gmail.com>.
¯\_(ツ)_/¯

On 5 April 2018 at 16:02, Richard A. Bross <rb...@oaktreepeak.com> wrote:

> I can't duplicate the issue now.  Works like it always has . .
>
> ----- Original Message -----
> From: "Furcy Pin" <pi...@gmail.com>
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:57:39 AM
> Subject: Re: ALTER TABLE DROP PARTITION not working on S3
>
>
> Indeed.
>
>
> If I remember correctly, s3 does not really have the concept of "folder"
> like HDFS has, and Hive sort of makes up for it by creating a descriptor
> file
> where the partition "folder" is supposed to be. Maybe this is what is
> missing here.
>
>
> Perhaps you could try doing a "MSCK REPAIR TABLE tablename" to make sure
> that the partitions are correctly loaded and then try again dropping that
> particular partition?
>
>
> Or look at your s3 folder if you see any such "partition folder file" and
> check if it is missing for this particular partition?
>
>
>
>
> On 5 April 2018 at 15:40, Richard A. Bross < rb@oaktreepeak.com > wrote:
>
>
> Leaving the column list out, here you go:
>
> # Detailed Table Information
> Database: default
> Owner: hadoop
> CreateTime: Thu Apr 05 13:24:33 UTC 2018
> LastAccessTime: UNKNOWN
> Retention: 0
> Location: s3://zoomi-proto-warehouse-measurements/
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
> numFiles 10
> numPartitions 7
> numRows 153
> orc.compress ZLIB
> orc.create.index true
> rawDataSize 113563
> totalSize 37801
> transient_lastDdlTime 1522934673
>
> # Storage Information
> SerDe Library: org.apache.hadoop.hive.ql.io .orc.OrcSerde
> InputFormat: org.apache.hadoop.hive.ql.io .orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io .orc.OrcOutputFormat
> Compressed: No
> Num Buckets: 61
> Bucket Columns: [crs_id]
> Sort Columns: []
> Storage Desc Params:
> serialization.format 1
> Time taken: 0.467 seconds, Fetched: 98 row(s)
>
>
> ----- Original Message -----
> From: "Furcy Pin" < pin.furcy@gmail.com >
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:21:06 AM
>
>
> Subject: Re: ALTER TABLE DROP PARTITION not working on S3
>
>
> Hi Richard,
>
>
> could you please check if your table is EXTERNAL?
> You can see it with a "DESCRIBE FORMATTED table_name ;"
>
>
> That's what external tables are for, they don't delete underlying data
> when you drop them.
>
>
>
>
> On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote:
>
>
> I think that someone put a file in there manually. Would that prevent Hive
> from dropping the partition. I also did a "drop table" and the s3 object
> keys persisted.
>
>
>
> ----- Original Message -----
> From: "Richard A. Bross" < rb@oaktreepeak.com >
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:14:52 AM
> Subject: ALTER TABLE DROP PARTITION not working on S3
>
> Hi,
>
> I have a Hive managed table on S3, "api_measurements". I've tried dropping
> a partition like so:
>
> hive> alter table api_measurements drop if exists
> partition(daydate='2018-04-04', epoch=1522876500);
> Dropped the partition daydate=2018-04-04/epoch=1522876500
> OK
> Time taken: 2.109 seconds
>
> Yet the data is still on S3. Because object keys on S3 are always strings,
> I also tried this:
>
> hive> alter table api_measurements drop partition(daydate='2018-04-04',
> epoch='1522876500');
> OK
> Time taken: 0.135 seconds
>
> Yet the object keys and data are still there. I assume that Im missing
> something really simple. Can anyone shed some light on this?
>
> Thanks
>
>
>
>

Re: ALTER TABLE DROP PARTITION not working on S3

Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
I can't duplicate the issue now.  Works like it always has . . 

----- Original Message -----
From: "Furcy Pin" <pi...@gmail.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:57:39 AM
Subject: Re: ALTER TABLE DROP PARTITION not working on S3


Indeed. 


If I remember correctly, s3 does not really have the concept of "folder" like HDFS has, and Hive sort of makes up for it by creating a descriptor file 
where the partition "folder" is supposed to be. Maybe this is what is missing here. 


Perhaps you could try doing a "MSCK REPAIR TABLE tablename" to make sure that the partitions are correctly loaded and then try again dropping that particular partition? 


Or look at your s3 folder if you see any such "partition folder file" and check if it is missing for this particular partition? 




On 5 April 2018 at 15:40, Richard A. Bross < rb@oaktreepeak.com > wrote: 


Leaving the column list out, here you go: 

# Detailed Table Information 
Database: default 
Owner: hadoop 
CreateTime: Thu Apr 05 13:24:33 UTC 2018 
LastAccessTime: UNKNOWN 
Retention: 0 
Location: s3://zoomi-proto-warehouse-measurements/ 
Table Type: MANAGED_TABLE 
Table Parameters: 
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} 
numFiles 10 
numPartitions 7 
numRows 153 
orc.compress ZLIB 
orc.create.index true 
rawDataSize 113563 
totalSize 37801 
transient_lastDdlTime 1522934673 

# Storage Information 
SerDe Library: org.apache.hadoop.hive.ql.io .orc.OrcSerde 
InputFormat: org.apache.hadoop.hive.ql.io .orc.OrcInputFormat 
OutputFormat: org.apache.hadoop.hive.ql.io .orc.OrcOutputFormat 
Compressed: No 
Num Buckets: 61 
Bucket Columns: [crs_id] 
Sort Columns: [] 
Storage Desc Params: 
serialization.format 1 
Time taken: 0.467 seconds, Fetched: 98 row(s) 


----- Original Message ----- 
From: "Furcy Pin" < pin.furcy@gmail.com > 
To: user@hive.apache.org 
Sent: Thursday, April 5, 2018 9:21:06 AM 


Subject: Re: ALTER TABLE DROP PARTITION not working on S3 


Hi Richard, 


could you please check if your table is EXTERNAL? 
You can see it with a "DESCRIBE FORMATTED table_name ;" 


That's what external tables are for, they don't delete underlying data when you drop them. 




On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote: 


I think that someone put a file in there manually. Would that prevent Hive from dropping the partition. I also did a "drop table" and the s3 object keys persisted. 



----- Original Message ----- 
From: "Richard A. Bross" < rb@oaktreepeak.com > 
To: user@hive.apache.org 
Sent: Thursday, April 5, 2018 9:14:52 AM 
Subject: ALTER TABLE DROP PARTITION not working on S3 

Hi, 

I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so: 

hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500); 
Dropped the partition daydate=2018-04-04/epoch=1522876500 
OK 
Time taken: 2.109 seconds 

Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this: 

hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500'); 
OK 
Time taken: 0.135 seconds 

Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this? 

Thanks 




Re: ALTER TABLE DROP PARTITION not working on S3

Posted by Furcy Pin <pi...@gmail.com>.
Indeed.

If I remember correctly, s3 does not really have the concept of "folder"
like HDFS has, and Hive sort of makes up for it by creating a descriptor
file
where the partition "folder" is supposed to be. Maybe this is what is
missing here.

Perhaps you could try doing a "MSCK REPAIR TABLE tablename" to make sure
that the partitions are correctly loaded and then try again dropping that
particular partition?

Or look at your s3 folder if you see any such "partition folder file" and
check if it is missing for this particular partition?


On 5 April 2018 at 15:40, Richard A. Bross <rb...@oaktreepeak.com> wrote:

> Leaving the column list out, here you go:
>
> # Detailed Table Information
> Database:               default
> Owner:                  hadoop
> CreateTime:             Thu Apr 05 13:24:33 UTC 2018
> LastAccessTime:         UNKNOWN
> Retention:              0
> Location:               s3://zoomi-proto-warehouse-measurements/
> Table Type:             MANAGED_TABLE
> Table Parameters:
>         COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>         numFiles                10
>         numPartitions           7
>         numRows                 153
>         orc.compress            ZLIB
>         orc.create.index        true
>         rawDataSize             113563
>         totalSize               37801
>         transient_lastDdlTime   1522934673
>
> # Storage Information
> SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:             No
> Num Buckets:            61
> Bucket Columns:         [crs_id]
> Sort Columns:           []
> Storage Desc Params:
>         serialization.format    1
> Time taken: 0.467 seconds, Fetched: 98 row(s)
>
>
> ----- Original Message -----
> From: "Furcy Pin" <pi...@gmail.com>
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:21:06 AM
> Subject: Re: ALTER TABLE DROP PARTITION not working on S3
>
>
> Hi Richard,
>
>
> could you please check if your table is EXTERNAL?
> You can see it with a "DESCRIBE FORMATTED table_name ;"
>
>
> That's what external tables are for, they don't delete underlying data
> when you drop them.
>
>
>
>
> On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote:
>
>
> I think that someone put a file in there manually. Would that prevent Hive
> from dropping the partition. I also did a "drop table" and the s3 object
> keys persisted.
>
>
>
> ----- Original Message -----
> From: "Richard A. Bross" < rb@oaktreepeak.com >
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:14:52 AM
> Subject: ALTER TABLE DROP PARTITION not working on S3
>
> Hi,
>
> I have a Hive managed table on S3, "api_measurements". I've tried dropping
> a partition like so:
>
> hive> alter table api_measurements drop if exists
> partition(daydate='2018-04-04', epoch=1522876500);
> Dropped the partition daydate=2018-04-04/epoch=1522876500
> OK
> Time taken: 2.109 seconds
>
> Yet the data is still on S3. Because object keys on S3 are always strings,
> I also tried this:
>
> hive> alter table api_measurements drop partition(daydate='2018-04-04',
> epoch='1522876500');
> OK
> Time taken: 0.135 seconds
>
> Yet the object keys and data are still there. I assume that Im missing
> something really simple. Can anyone shed some light on this?
>
> Thanks
>
>
>

Re: ALTER TABLE DROP PARTITION not working on S3

Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
Leaving the column list out, here you go:

# Detailed Table Information             
Database:               default                  
Owner:                  hadoop                   
CreateTime:             Thu Apr 05 13:24:33 UTC 2018     
LastAccessTime:         UNKNOWN                  
Retention:              0                        
Location:               s3://zoomi-proto-warehouse-measurements/         
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
        numFiles                10                  
        numPartitions           7                   
        numRows                 153                 
        orc.compress            ZLIB                
        orc.create.index        true                
        rawDataSize             113563              
        totalSize               37801               
        transient_lastDdlTime   1522934673          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde        
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        
Compressed:             No                       
Num Buckets:            61                       
Bucket Columns:         [crs_id]                 
Sort Columns:           []                       
Storage Desc Params:             
        serialization.format    1                   
Time taken: 0.467 seconds, Fetched: 98 row(s)


----- Original Message -----
From: "Furcy Pin" <pi...@gmail.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:21:06 AM
Subject: Re: ALTER TABLE DROP PARTITION not working on S3


Hi Richard, 


could you please check if your table is EXTERNAL? 
You can see it with a "DESCRIBE FORMATTED table_name ;" 


That's what external tables are for, they don't delete underlying data when you drop them. 




On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote: 


I think that someone put a file in there manually. Would that prevent Hive from dropping the partition. I also did a "drop table" and the s3 object keys persisted. 



----- Original Message ----- 
From: "Richard A. Bross" < rb@oaktreepeak.com > 
To: user@hive.apache.org 
Sent: Thursday, April 5, 2018 9:14:52 AM 
Subject: ALTER TABLE DROP PARTITION not working on S3 

Hi, 

I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so: 

hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500); 
Dropped the partition daydate=2018-04-04/epoch=1522876500 
OK 
Time taken: 2.109 seconds 

Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this: 

hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500'); 
OK 
Time taken: 0.135 seconds 

Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this? 

Thanks 



Re: ALTER TABLE DROP PARTITION not working on S3

Posted by Furcy Pin <pi...@gmail.com>.
Hi Richard,

could you please check if your table is EXTERNAL?
You can see it with a "DESCRIBE FORMATTED table_name ;"

That's what external tables are for, they don't delete underlying data when
you drop them.



On 5 April 2018 at 15:18, Richard A. Bross <rb...@oaktreepeak.com> wrote:

> I think that someone put a file in there manually.  Would that prevent
> Hive from dropping the partition.  I also did a "drop table" and the s3
> object keys persisted.
>
> ----- Original Message -----
> From: "Richard A. Bross" <rb...@oaktreepeak.com>
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:14:52 AM
> Subject: ALTER TABLE DROP PARTITION not working on S3
>
> Hi,
>
> I have a Hive managed table on S3, "api_measurements".  I've tried
> dropping a partition like so:
>
> hive> alter table api_measurements drop if exists
> partition(daydate='2018-04-04', epoch=1522876500);
> Dropped the partition daydate=2018-04-04/epoch=1522876500
> OK
> Time taken: 2.109 seconds
>
> Yet the data is still on S3.  Because object keys on S3 are always
> strings, I also tried this:
>
> hive> alter table api_measurements drop partition(daydate='2018-04-04',
> epoch='1522876500');
> OK
> Time taken: 0.135 seconds
>
> Yet the object keys and data are still there.  I assume that Im missing
> something really simple.  Can anyone shed some light on this?
>
> Thanks
>
>

Re: ALTER TABLE DROP PARTITION not working on S3

Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
I think that someone put a file in there manually.  Would that prevent Hive from dropping the partition.  I also did a "drop table" and the s3 object keys persisted.

----- Original Message -----
From: "Richard A. Bross" <rb...@oaktreepeak.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:14:52 AM
Subject: ALTER TABLE DROP PARTITION not working on S3

Hi,

I have a Hive managed table on S3, "api_measurements".  I've tried dropping a partition like so:

hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500);
Dropped the partition daydate=2018-04-04/epoch=1522876500
OK
Time taken: 2.109 seconds

Yet the data is still on S3.  Because object keys on S3 are always strings, I also tried this:

hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500');
OK
Time taken: 0.135 seconds

Yet the object keys and data are still there.  I assume that Im missing something really simple.  Can anyone shed some light on this?

Thanks