You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal Vijayaraghavan (Jira)" <ji...@apache.org> on 2021/11/03 15:29:00 UTC

[jira] [Commented] (HIVE-25669) After Insert overwrite (managed table), the previous data of the table is not deleted

    [ https://issues.apache.org/jira/browse/HIVE-25669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438138#comment-17438138 ] 

Gopal Vijayaraghavan commented on HIVE-25669:
---------------------------------------------

bq. why aren't my old folders(base_0000009, base_00000010) deleted?

https://cwiki.apache.org/confluence/display/hive/hive+transactions#HiveTransactions-BaseandDeltaDirectories

> After Insert overwrite (managed table), the previous data of the table is not deleted
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-25669
>                 URL: https://issues.apache.org/jira/browse/HIVE-25669
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.1.0
>         Environment: 1. hadoop eco versions
>   - hive : 3.1.0
>   - Tez : 0.9.1
>   - hdfs : 3.1.1
> 2. Table info
>   - table name : test_t1    (*sample name)
>   - table : Managed table
>   - partitioning : X (non partition)
> 3. Table properties
>   - transactional = true
>   - transactional_properties = insert_only
>   - bucketing_version = 2
>   - auto.purge =  true / false  (*apply both)
>  
>            Reporter: Jihoon Lee
>            Priority: Minor
>
> When insert overwrite table, 'auto.purge' does not seem to work well.
> h2. Step1. Create table
> create table test_t1 (
> col1 string,
> col2 string,
> col3 string,
> col4 string
> )
>  
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
> 'hdfs://nameservice1/user/hive/warehouse/st.db/test_ljh5'
> TBLPROPERTIES (
> 'auto.purge'='{color:#de350b}*false*{color}',
> 'bucketing_version'='2',
> 'transactional'='true',
> 'transactional_properties'='insert_only')
>  
> h2. 2. Insert overwrite
>  2-1)
> insert overwrite table test_t1 
> select * from origin_t1 limit 10000;
>  2-2)
> insert overwrite table test_t1 
> select * from origin_t1 limit 20000;
>   2-3)
> insert overwrite table test_t1 
> select * from origin_t1 limit 30000;
>  
> h2. 3. Check HDFS files
>  - Hue file browser 
>   !https://mail.google.com/mail/u/0?ui=2&ik=10577dc09a&attid=0.1&permmsgid=msg-f:1715412595915827826&th=17ce5eaad6cb2a72&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9ygBFCoYIqI3etBmYvvRfg1l7ea2lSBC5QLxHMFhuWOh8f5u_JbzO2d65-t5I6v4Xxn9zF-ZKVya4uwIL_nDsELRTYiZ321XsPwqXzHZmG_HYA0wL3tAGLAN8&disp=emb!
>  why aren't my old folders(base_0000009, base_00000010) deleted?
> It's the same even if i set the setting to '*auto.purge=true*' and to '*auto.purge=false*'.
>  
> And I have referenced here. 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML]
>  * INSERT OVERWRITE will overwrite any existing data in the table or partition
>  ** unless {{IF NOT EXISTS}} is provided for a partition (as of Hive 0.9.0).
>  ** As of Hive 2.3.0 (HIVE-15880), if the table has [TBLPROPERTIES|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-listTableProperties] ("auto.purge"="true") the previous data of the table is not moved to Trash when INSERT OVERWRITE query is run against the table. This functionality is applicable only for managed tables (see [managed tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ManagedandExternalTables]) and is turned off when "auto.purge" property is unset or set to false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)