You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hui An (JIRA)" <ji...@apache.org> on 2019/08/02 08:36:00 UTC
[jira] [Updated] (HIVE-22077) Inserting overwrite partitions clause
does not clean directories while partitions' info is not stored in metadata
[ https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hui An updated HIVE-22077:
--------------------------
Description:
Inserting overwrite static partitions may not clean related HDFS location if partitions' info is not stored in metadata.
Steps to Reproduce this issue :
------------------------------------------------
1. Create a managed table :
------------------------------------------------
{code:sql}
CREATE TABLE `test`(
`id` string)
PARTITIONED BY (
`dayno` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION |
'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test'
TBLPROPERTIES (
'transient_lastDdlTime'='1564731656')
{code}
------------------------------------------------
2. Create partition's directory and put some data under it
------------------------------------------------
{code:java}
hdfs dfs -mkdir hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}
------------------------------------------------
3. Insert overwrite partition dayno=20190802
------------------------------------------------
{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT "some value";
{code}
------------------------------------------------
4. We could see the test.data under partition directory is not deleted.
------------------------------------------------
was:
Inserting overwrite static partitions may not clean related HDFS location if partitions' info is not stored in metadata.
Steps to Reproduce this issue :
------------------------------------------------
1. Create a managed table :
------------------------------------------------
{code:sql}
CREATE TABLE `test`(
`id` string)
PARTITIONED BY (
`dayno` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION |
'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test'
TBLPROPERTIES (
'transient_lastDdlTime'='1564731656')
{code}
------------------------------------------------
2. Create partition's directory and put some data under it
------------------------------------------------
{code:java}
hdfs dfs -mkdir hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}
------------------------------------------------
3. Insert overwrite partition dayno=20190802
------------------------------------------------
{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT 1;
{code}
------------------------------------------------
4. We could see the test.data under partition directory is not deleted.
------------------------------------------------
> Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.1, 4.0.0, 2.3.4
> Reporter: Hui An
> Assignee: Hui An
> Priority: Major
>
> Inserting overwrite static partitions may not clean related HDFS location if partitions' info is not stored in metadata.
> Steps to Reproduce this issue :
> ------------------------------------------------
> 1. Create a managed table :
> ------------------------------------------------
> {code:sql}
> CREATE TABLE `test`(
> `id` string)
> PARTITIONED BY (
> `dayno` string)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION |
> 'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test'
> TBLPROPERTIES (
> 'transient_lastDdlTime'='1564731656')
> {code}
> ------------------------------------------------
> 2. Create partition's directory and put some data under it
> ------------------------------------------------
> {code:java}
> hdfs dfs -mkdir hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> hdfs dfs -put test.data hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> {code}
> ------------------------------------------------
> 3. Insert overwrite partition dayno=20190802
> ------------------------------------------------
> {code:sql}
> INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
> SELECT "some value";
> {code}
> ------------------------------------------------
> 4. We could see the test.data under partition directory is not deleted.
> ------------------------------------------------
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)