You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2019/04/23 09:04:00 UTC

[jira] [Resolved] (HIVE-20968) Support conversion of managed to external where location set was not owned by hive

     [ https://issues.apache.org/jira/browse/HIVE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sankar Hariappan resolved HIVE-20968.
-------------------------------------
    Resolution: Won't Fix
      Assignee: mahesh kumar behera  (was: Sankar Hariappan)

Hive replication cannot handle this scenario due to following reasons.
- Partitions can be added even after bootstrapping the table to replica. So, if any newly added partition’s location is not owned by “hive” user, replication process may need to convert the table from managed full acid to external at replica which is not allowed.
- User may drop a partition which has location not owned by “hive”. Now, if this table was already bootstrapped, then it would be external table at replica. But, after dropping of this partition, it should be converted to managed full acid table at replica, which is also not allowed.
- User may manually change the table/partition location owner/permission at primary, if it was set by them. Replication process cannot track those changes.
 
Due to above constraints, we *propose not to honour this migration rule during Hive replication*.
Closing this jira as "Won't do".
cc [~thejas], [~anishek], [~ashutosh.bapat]

> Support conversion of managed to external where location set was not owned by hive
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-20968
>                 URL: https://issues.apache.org/jira/browse/HIVE-20968
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: DR, pull-request-available
>         Attachments: HIVE-20968.01.patch, HIVE-20968.02.patch, HIVE-20968.03.patch, HIVE-20968.04.patch, HIVE-20968.05.patch
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As per migration rule, if a location is outside the default managed table directory and the location is not owned by "hive" user, then it should be converted to external table after upgrade.
>  So, the same rule is applicable for Hive replication where the data of source managed table is residing outside the default warehouse directory and is not owned by "hive" user.
>  During this conversion, the path should be preserved in target as well so that failover works seamlessly.
>  # If the table location is out side hive warehouse and is not owned by hive, then the table at target will be converted to external table. But the location can not be retained , it will be retained relative to hive external warehouse directory. 
>  #  As the table is not an external table at source, only those data which are added using events will be replicated.
>  # The ownership of the location will be stored in the create table event and will be used to compare it with strict.managed.tables.migration.owner to decide if the flag in replication scope can be set. This flag is used to convert the managed table to external table at target.
> Some of the scenarios needs to be blocked if the database is set for replication from a cluster with non strict managed table setting to strict managed table.
> 1. Block alter table / partition set location for database with source of replication set for managed tables
> 2. If user manually changes the ownership of the location, hive replication may go to a non recoverable state.
> 3. Block add partition if the location ownership is different than table location for managed tables.
> 4. User needs to set strict.managed.tables.migration.owner along with dump command (default to hive user). This value will be used during dump to decide the ownership which will be used during load to decide the table type. The location owner information can be stored in the events during create table. The flag can be stored in replication spec. Check other such configs used in upgrade tool.
> 5. Block conversion from managed to external and vice versa. Pass some flag in upgrade flow to allow this conversion during upgrade flow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)