You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Richard A. Bross" <rb...@oaktreepeak.com> on 2018/04/05 13:14:52 UTC
ALTER TABLE DROP PARTITION not working on S3
Hi,
I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so:
hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500);
Dropped the partition daydate=2018-04-04/epoch=1522876500
OK
Time taken: 2.109 seconds
Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this:
hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500');
OK
Time taken: 0.135 seconds
Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this?
Thanks
Re: ALTER TABLE DROP PARTITION not working on S3
Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
No definitely internal. It's loaded from an external table.
----- Original Message -----
From: "Furcy Pin" <pi...@gmail.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:21:06 AM
Subject: Re: ALTER TABLE DROP PARTITION not working on S3
Hi Richard,
could you please check if your table is EXTERNAL?
You can see it with a "DESCRIBE FORMATTED table_name ;"
That's what external tables are for, they don't delete underlying data when you drop them.
On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote:
I think that someone put a file in there manually. Would that prevent Hive from dropping the partition. I also did a "drop table" and the s3 object keys persisted.
----- Original Message -----
From: "Richard A. Bross" < rb@oaktreepeak.com >
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:14:52 AM
Subject: ALTER TABLE DROP PARTITION not working on S3
Hi,
I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so:
hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500);
Dropped the partition daydate=2018-04-04/epoch=1522876500
OK
Time taken: 2.109 seconds
Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this:
hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500');
OK
Time taken: 0.135 seconds
Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this?
Thanks
Re: ALTER TABLE DROP PARTITION not working on S3
Posted by Furcy Pin <pi...@gmail.com>.
¯\_(ツ)_/¯
On 5 April 2018 at 16:02, Richard A. Bross <rb...@oaktreepeak.com> wrote:
> I can't duplicate the issue now. Works like it always has . .
>
> ----- Original Message -----
> From: "Furcy Pin" <pi...@gmail.com>
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:57:39 AM
> Subject: Re: ALTER TABLE DROP PARTITION not working on S3
>
>
> Indeed.
>
>
> If I remember correctly, s3 does not really have the concept of "folder"
> like HDFS has, and Hive sort of makes up for it by creating a descriptor
> file
> where the partition "folder" is supposed to be. Maybe this is what is
> missing here.
>
>
> Perhaps you could try doing a "MSCK REPAIR TABLE tablename" to make sure
> that the partitions are correctly loaded and then try again dropping that
> particular partition?
>
>
> Or look at your s3 folder if you see any such "partition folder file" and
> check if it is missing for this particular partition?
>
>
>
>
> On 5 April 2018 at 15:40, Richard A. Bross < rb@oaktreepeak.com > wrote:
>
>
> Leaving the column list out, here you go:
>
> # Detailed Table Information
> Database: default
> Owner: hadoop
> CreateTime: Thu Apr 05 13:24:33 UTC 2018
> LastAccessTime: UNKNOWN
> Retention: 0
> Location: s3://zoomi-proto-warehouse-measurements/
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
> numFiles 10
> numPartitions 7
> numRows 153
> orc.compress ZLIB
> orc.create.index true
> rawDataSize 113563
> totalSize 37801
> transient_lastDdlTime 1522934673
>
> # Storage Information
> SerDe Library: org.apache.hadoop.hive.ql.io .orc.OrcSerde
> InputFormat: org.apache.hadoop.hive.ql.io .orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io .orc.OrcOutputFormat
> Compressed: No
> Num Buckets: 61
> Bucket Columns: [crs_id]
> Sort Columns: []
> Storage Desc Params:
> serialization.format 1
> Time taken: 0.467 seconds, Fetched: 98 row(s)
>
>
> ----- Original Message -----
> From: "Furcy Pin" < pin.furcy@gmail.com >
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:21:06 AM
>
>
> Subject: Re: ALTER TABLE DROP PARTITION not working on S3
>
>
> Hi Richard,
>
>
> could you please check if your table is EXTERNAL?
> You can see it with a "DESCRIBE FORMATTED table_name ;"
>
>
> That's what external tables are for, they don't delete underlying data
> when you drop them.
>
>
>
>
> On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote:
>
>
> I think that someone put a file in there manually. Would that prevent Hive
> from dropping the partition. I also did a "drop table" and the s3 object
> keys persisted.
>
>
>
> ----- Original Message -----
> From: "Richard A. Bross" < rb@oaktreepeak.com >
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:14:52 AM
> Subject: ALTER TABLE DROP PARTITION not working on S3
>
> Hi,
>
> I have a Hive managed table on S3, "api_measurements". I've tried dropping
> a partition like so:
>
> hive> alter table api_measurements drop if exists
> partition(daydate='2018-04-04', epoch=1522876500);
> Dropped the partition daydate=2018-04-04/epoch=1522876500
> OK
> Time taken: 2.109 seconds
>
> Yet the data is still on S3. Because object keys on S3 are always strings,
> I also tried this:
>
> hive> alter table api_measurements drop partition(daydate='2018-04-04',
> epoch='1522876500');
> OK
> Time taken: 0.135 seconds
>
> Yet the object keys and data are still there. I assume that Im missing
> something really simple. Can anyone shed some light on this?
>
> Thanks
>
>
>
>
Re: ALTER TABLE DROP PARTITION not working on S3
Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
I can't duplicate the issue now. Works like it always has . .
----- Original Message -----
From: "Furcy Pin" <pi...@gmail.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:57:39 AM
Subject: Re: ALTER TABLE DROP PARTITION not working on S3
Indeed.
If I remember correctly, s3 does not really have the concept of "folder" like HDFS has, and Hive sort of makes up for it by creating a descriptor file
where the partition "folder" is supposed to be. Maybe this is what is missing here.
Perhaps you could try doing a "MSCK REPAIR TABLE tablename" to make sure that the partitions are correctly loaded and then try again dropping that particular partition?
Or look at your s3 folder if you see any such "partition folder file" and check if it is missing for this particular partition?
On 5 April 2018 at 15:40, Richard A. Bross < rb@oaktreepeak.com > wrote:
Leaving the column list out, here you go:
# Detailed Table Information
Database: default
Owner: hadoop
CreateTime: Thu Apr 05 13:24:33 UTC 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: s3://zoomi-proto-warehouse-measurements/
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
numFiles 10
numPartitions 7
numRows 153
orc.compress ZLIB
orc.create.index true
rawDataSize 113563
totalSize 37801
transient_lastDdlTime 1522934673
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io .orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io .orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io .orc.OrcOutputFormat
Compressed: No
Num Buckets: 61
Bucket Columns: [crs_id]
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.467 seconds, Fetched: 98 row(s)
----- Original Message -----
From: "Furcy Pin" < pin.furcy@gmail.com >
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:21:06 AM
Subject: Re: ALTER TABLE DROP PARTITION not working on S3
Hi Richard,
could you please check if your table is EXTERNAL?
You can see it with a "DESCRIBE FORMATTED table_name ;"
That's what external tables are for, they don't delete underlying data when you drop them.
On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote:
I think that someone put a file in there manually. Would that prevent Hive from dropping the partition. I also did a "drop table" and the s3 object keys persisted.
----- Original Message -----
From: "Richard A. Bross" < rb@oaktreepeak.com >
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:14:52 AM
Subject: ALTER TABLE DROP PARTITION not working on S3
Hi,
I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so:
hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500);
Dropped the partition daydate=2018-04-04/epoch=1522876500
OK
Time taken: 2.109 seconds
Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this:
hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500');
OK
Time taken: 0.135 seconds
Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this?
Thanks
Re: ALTER TABLE DROP PARTITION not working on S3
Posted by Furcy Pin <pi...@gmail.com>.
Indeed.
If I remember correctly, s3 does not really have the concept of "folder"
like HDFS has, and Hive sort of makes up for it by creating a descriptor
file
where the partition "folder" is supposed to be. Maybe this is what is
missing here.
Perhaps you could try doing a "MSCK REPAIR TABLE tablename" to make sure
that the partitions are correctly loaded and then try again dropping that
particular partition?
Or look at your s3 folder if you see any such "partition folder file" and
check if it is missing for this particular partition?
On 5 April 2018 at 15:40, Richard A. Bross <rb...@oaktreepeak.com> wrote:
> Leaving the column list out, here you go:
>
> # Detailed Table Information
> Database: default
> Owner: hadoop
> CreateTime: Thu Apr 05 13:24:33 UTC 2018
> LastAccessTime: UNKNOWN
> Retention: 0
> Location: s3://zoomi-proto-warehouse-measurements/
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
> numFiles 10
> numPartitions 7
> numRows 153
> orc.compress ZLIB
> orc.create.index true
> rawDataSize 113563
> totalSize 37801
> transient_lastDdlTime 1522934673
>
> # Storage Information
> SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed: No
> Num Buckets: 61
> Bucket Columns: [crs_id]
> Sort Columns: []
> Storage Desc Params:
> serialization.format 1
> Time taken: 0.467 seconds, Fetched: 98 row(s)
>
>
> ----- Original Message -----
> From: "Furcy Pin" <pi...@gmail.com>
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:21:06 AM
> Subject: Re: ALTER TABLE DROP PARTITION not working on S3
>
>
> Hi Richard,
>
>
> could you please check if your table is EXTERNAL?
> You can see it with a "DESCRIBE FORMATTED table_name ;"
>
>
> That's what external tables are for, they don't delete underlying data
> when you drop them.
>
>
>
>
> On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote:
>
>
> I think that someone put a file in there manually. Would that prevent Hive
> from dropping the partition. I also did a "drop table" and the s3 object
> keys persisted.
>
>
>
> ----- Original Message -----
> From: "Richard A. Bross" < rb@oaktreepeak.com >
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:14:52 AM
> Subject: ALTER TABLE DROP PARTITION not working on S3
>
> Hi,
>
> I have a Hive managed table on S3, "api_measurements". I've tried dropping
> a partition like so:
>
> hive> alter table api_measurements drop if exists
> partition(daydate='2018-04-04', epoch=1522876500);
> Dropped the partition daydate=2018-04-04/epoch=1522876500
> OK
> Time taken: 2.109 seconds
>
> Yet the data is still on S3. Because object keys on S3 are always strings,
> I also tried this:
>
> hive> alter table api_measurements drop partition(daydate='2018-04-04',
> epoch='1522876500');
> OK
> Time taken: 0.135 seconds
>
> Yet the object keys and data are still there. I assume that Im missing
> something really simple. Can anyone shed some light on this?
>
> Thanks
>
>
>
Re: ALTER TABLE DROP PARTITION not working on S3
Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
Leaving the column list out, here you go:
# Detailed Table Information
Database: default
Owner: hadoop
CreateTime: Thu Apr 05 13:24:33 UTC 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: s3://zoomi-proto-warehouse-measurements/
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
numFiles 10
numPartitions 7
numRows 153
orc.compress ZLIB
orc.create.index true
rawDataSize 113563
totalSize 37801
transient_lastDdlTime 1522934673
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: 61
Bucket Columns: [crs_id]
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.467 seconds, Fetched: 98 row(s)
----- Original Message -----
From: "Furcy Pin" <pi...@gmail.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:21:06 AM
Subject: Re: ALTER TABLE DROP PARTITION not working on S3
Hi Richard,
could you please check if your table is EXTERNAL?
You can see it with a "DESCRIBE FORMATTED table_name ;"
That's what external tables are for, they don't delete underlying data when you drop them.
On 5 April 2018 at 15:18, Richard A. Bross < rb@oaktreepeak.com > wrote:
I think that someone put a file in there manually. Would that prevent Hive from dropping the partition. I also did a "drop table" and the s3 object keys persisted.
----- Original Message -----
From: "Richard A. Bross" < rb@oaktreepeak.com >
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:14:52 AM
Subject: ALTER TABLE DROP PARTITION not working on S3
Hi,
I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so:
hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500);
Dropped the partition daydate=2018-04-04/epoch=1522876500
OK
Time taken: 2.109 seconds
Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this:
hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500');
OK
Time taken: 0.135 seconds
Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this?
Thanks
Re: ALTER TABLE DROP PARTITION not working on S3
Posted by Furcy Pin <pi...@gmail.com>.
Hi Richard,
could you please check if your table is EXTERNAL?
You can see it with a "DESCRIBE FORMATTED table_name ;"
That's what external tables are for, they don't delete underlying data when
you drop them.
On 5 April 2018 at 15:18, Richard A. Bross <rb...@oaktreepeak.com> wrote:
> I think that someone put a file in there manually. Would that prevent
> Hive from dropping the partition. I also did a "drop table" and the s3
> object keys persisted.
>
> ----- Original Message -----
> From: "Richard A. Bross" <rb...@oaktreepeak.com>
> To: user@hive.apache.org
> Sent: Thursday, April 5, 2018 9:14:52 AM
> Subject: ALTER TABLE DROP PARTITION not working on S3
>
> Hi,
>
> I have a Hive managed table on S3, "api_measurements". I've tried
> dropping a partition like so:
>
> hive> alter table api_measurements drop if exists
> partition(daydate='2018-04-04', epoch=1522876500);
> Dropped the partition daydate=2018-04-04/epoch=1522876500
> OK
> Time taken: 2.109 seconds
>
> Yet the data is still on S3. Because object keys on S3 are always
> strings, I also tried this:
>
> hive> alter table api_measurements drop partition(daydate='2018-04-04',
> epoch='1522876500');
> OK
> Time taken: 0.135 seconds
>
> Yet the object keys and data are still there. I assume that Im missing
> something really simple. Can anyone shed some light on this?
>
> Thanks
>
>
Re: ALTER TABLE DROP PARTITION not working on S3
Posted by "Richard A. Bross" <rb...@oaktreepeak.com>.
I think that someone put a file in there manually. Would that prevent Hive from dropping the partition. I also did a "drop table" and the s3 object keys persisted.
----- Original Message -----
From: "Richard A. Bross" <rb...@oaktreepeak.com>
To: user@hive.apache.org
Sent: Thursday, April 5, 2018 9:14:52 AM
Subject: ALTER TABLE DROP PARTITION not working on S3
Hi,
I have a Hive managed table on S3, "api_measurements". I've tried dropping a partition like so:
hive> alter table api_measurements drop if exists partition(daydate='2018-04-04', epoch=1522876500);
Dropped the partition daydate=2018-04-04/epoch=1522876500
OK
Time taken: 2.109 seconds
Yet the data is still on S3. Because object keys on S3 are always strings, I also tried this:
hive> alter table api_measurements drop partition(daydate='2018-04-04', epoch='1522876500');
OK
Time taken: 0.135 seconds
Yet the object keys and data are still there. I assume that Im missing something really simple. Can anyone shed some light on this?
Thanks