You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "mahesh kumar behera (JIRA)" <ji...@apache.org> on 2019/05/22 03:39:00 UTC

[jira] [Created] (HIVE-21773) Supporting external table replication with partition filter.

mahesh kumar behera created HIVE-21773:
------------------------------------------

             Summary: Supporting external table replication with partition filter.
                 Key: HIVE-21773
                 URL: https://issues.apache.org/jira/browse/HIVE-21773
             Project: Hive
          Issue Type: Sub-task
          Components: HiveServer2, repl
    Affects Versions: 4.0.0
            Reporter: mahesh kumar behera
            Assignee: mahesh kumar behera
             Fix For: 4.0.0


Hive external table replication is done differently than managed table replication. In case of external table, list is created for the locations of the table and partitions to be replicated. If the partition location is within the table location, then partition location is not added to the list. For partitions with location outside table, partition location is added to the list. In case of incremental dump, the data related events are ignored and just the metadata related events are dumped. The list of location is prepared and that is used for replication. During load, the events are replayed and then the distcp tasks are created, one for each location present in the list.

For partition level replication, not all partition will be present in the dump. So even if the partition locations are within the table location, each partition location will be added to the list.
 * If where condition is present in the REPL DUMP command then add location for each satisfying partition even though the partition location is within table location.
 * If table is not mentioned in the where clause then follow the older behavior.
 * If table is mentioned with a key but the key does not match any of the partitioned column then fail repl dump.
 * If the table is mentioned with the key and even if all the partitions are satisfying the filter condition, add location for each partition. This is to avoid copying partitions which are added using alter after the dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)