You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal Vijayaraghavan (Jira)" <ji...@apache.org> on 2021/11/03 15:29:00 UTC
[jira] [Commented] (HIVE-25669) After Insert overwrite (managed
table), the previous data of the table is not deleted
[ https://issues.apache.org/jira/browse/HIVE-25669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438138#comment-17438138 ]
Gopal Vijayaraghavan commented on HIVE-25669:
---------------------------------------------
bq. why aren't my old folders(base_0000009, base_00000010) deleted?
https://cwiki.apache.org/confluence/display/hive/hive+transactions#HiveTransactions-BaseandDeltaDirectories
> After Insert overwrite (managed table), the previous data of the table is not deleted
> -------------------------------------------------------------------------------------
>
> Key: HIVE-25669
> URL: https://issues.apache.org/jira/browse/HIVE-25669
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 3.1.0
> Environment: 1. hadoop eco versions
> - hive : 3.1.0
> - Tez : 0.9.1
> - hdfs : 3.1.1
> 2. Table info
> - table name : test_t1 (*sample name)
> - table : Managed table
> - partitioning : X (non partition)
> 3. Table properties
> - transactional = true
> - transactional_properties = insert_only
> - bucketing_version = 2
> - auto.purge = true / false (*apply both)
>
> Reporter: Jihoon Lee
> Priority: Minor
>
> When insert overwrite table, 'auto.purge' does not seem to work well.
> h2. Step1. Create table
> create table test_t1 (
> col1 string,
> col2 string,
> col3 string,
> col4 string
> )
>
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
> 'hdfs://nameservice1/user/hive/warehouse/st.db/test_ljh5'
> TBLPROPERTIES (
> 'auto.purge'='{color:#de350b}*false*{color}',
> 'bucketing_version'='2',
> 'transactional'='true',
> 'transactional_properties'='insert_only')
>
> h2. 2. Insert overwrite
> 2-1)
> insert overwrite table test_t1
> select * from origin_t1 limit 10000;
> 2-2)
> insert overwrite table test_t1
> select * from origin_t1 limit 20000;
> 2-3)
> insert overwrite table test_t1
> select * from origin_t1 limit 30000;
>
> h2. 3. Check HDFS files
> - Hue file browser
> !https://mail.google.com/mail/u/0?ui=2&ik=10577dc09a&attid=0.1&permmsgid=msg-f:1715412595915827826&th=17ce5eaad6cb2a72&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9ygBFCoYIqI3etBmYvvRfg1l7ea2lSBC5QLxHMFhuWOh8f5u_JbzO2d65-t5I6v4Xxn9zF-ZKVya4uwIL_nDsELRTYiZ321XsPwqXzHZmG_HYA0wL3tAGLAN8&disp=emb!
> why aren't my old folders(base_0000009, base_00000010) deleted?
> It's the same even if i set the setting to '*auto.purge=true*' and to '*auto.purge=false*'.
>
> And I have referenced here.
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML]
> * INSERT OVERWRITE will overwrite any existing data in the table or partition
> ** unless {{IF NOT EXISTS}} is provided for a partition (as of Hive 0.9.0).
> ** As of Hive 2.3.0 (HIVE-15880), if the table has [TBLPROPERTIES|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-listTableProperties] ("auto.purge"="true") the previous data of the table is not moved to Trash when INSERT OVERWRITE query is run against the table. This functionality is applicable only for managed tables (see [managed tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ManagedandExternalTables]) and is turned off when "auto.purge" property is unset or set to false.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)