You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2019/01/29 02:59:00 UTC

[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

     [ https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sankar Hariappan updated HIVE-21029:
------------------------------------
    Description: 
Existing deployments using hive replication do not get external tables replicated. For such deployments to enable external table replication they will have to provide a specific switch to first bootstrap external tables as part of hive incremental replication, following which the incremental replication will take care of further changes in external tables.

The switch will be provided by an additional hive configuration (for ex: hive.repl.bootstrap.external.tables) and is to be used in 
{code} WITH {code}  clause of 
{code} REPL DUMP {code} command. 

Additionally the existing hive config _hive.repl.include.external.tables_  will always have to be set to "true" in the above clause. 

Proposed usage for enabling external tables replication on existing DLM replication policy.
1. Consider an ongoing repl policy <db1> in incremental phase.
Enable hive.repl.include.external.tables=true and hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
- Dumps all events but skips events related to external tables.
- Instead, combine bootstrap dump for all external tables under “_bootstrap” directory.
- Also, includes the data locations file "_external_tables_info”.
- LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped before bootstrap dumping external tables.

2. REPL LOAD on this dump applies all the events first, copies external tables data and then bootstrap external tables (metadata).
- It is possible that the external tables (metadata) are not point-in time consistent with rest of the tables.
- But, it would be eventually consistent when the next incremental load is applied.
- This REPL LOAD is fault tolerant and can be retried if failed.

3. All future REPL DUMPs on this repl policy should set hive.repl.bootstrap.external.tables=false.
- If not set to false, then target might end up having inconsistent set of external tables as bootstrap wouldn’t clean-up any dropped external tables.

  was:
Existing deployments using hive replication do not get external tables replicated. For such deployments to enable external table replication they will have to provide a specific switch to first bootstrap external tables as part of hive incremental replication, following which the incremental replication will take care of further changes in external tables.

The switch will be provided by an additional hive configuration (for ex: hive.repl.bootstrap.external.tables) and is to be used in 
{code} WITH {code}  clause of 
{code} REPL DUMP {code} command. 

Additionally the existing hive config _hive.repl.include.external.tables_  will always have to be set to "true" in the above clause. 


> External table replication for existing deployments running incremental replication.
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-21029
>                 URL: https://issues.apache.org/jira/browse/HIVE-21029
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.0.0, 3.1.0, 3.1.1
>            Reporter: anishek
>            Assignee: Sankar Hariappan
>            Priority: Critical
>             Fix For: 4.0.0
>
>
> Existing deployments using hive replication do not get external tables replicated. For such deployments to enable external table replication they will have to provide a specific switch to first bootstrap external tables as part of hive incremental replication, following which the incremental replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing DLM replication policy.
> 1. Consider an ongoing repl policy <db1> in incremental phase.
> Enable hive.repl.include.external.tables=true and hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)