You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2019/06/15 18:24:00 UTC

[jira] [Commented] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

    [ https://issues.apache.org/jira/browse/HIVE-21763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864825#comment-16864825 ] 

Sankar Hariappan commented on HIVE-21763:
-----------------------------------------

01.patch has implemented 
1. REPL DUMP changes to take REPLACE clause and trigger bootstrap of newly included tables as per new replication policy.
2. REPL LOAD changes to detect tables that are excluded in new replication policy and drop them.
3. Dump/read new replication policy in _dumpmetadata file.

> Incremental replication to allow changing include/exclude tables list in replication policy.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21763
>                 URL: https://issues.apache.org/jira/browse/HIVE-21763
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, Replication, pull-request-available
>         Attachments: HIVE-21763.01.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM <last_repl_id> WITH <key_values_list>;
> - current_repl_policy and previous_repl_policy can be any format mentioned in Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on current_repl_policy.
> - Single table replication of format <db_name>.t1 doesn’t allow changing the policy dynamically. So REPLACE clause is not allowed if previous_repl_policy of this format.
> - If any table is added dynamically either due to change in regular expression or added to include list should be bootstrapped using independant table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the list by comparing the current_repl_policy & previous_repl_policy inputs and combine bootstrap dump for added tables as part of incremental dump. "_bootstrap" directory can be created in dump dir to accommodate all tables to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for replication based on defined replication policy + include/exclude list. So, Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views excluded in the new policy  compared to previous policy. It should be done before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)