You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Pravin Sinha (Jira)" <ji...@apache.org> on 2020/09/21 21:49:00 UTC

[jira] [Created] (HIVE-24187) Handle _files creation for HA config with same nameservice on source and destination

Pravin Sinha created HIVE-24187:
-----------------------------------

             Summary: Handle _files creation for HA config with same nameservice on source and destination
                 Key: HIVE-24187
                 URL: https://issues.apache.org/jira/browse/HIVE-24187
             Project: Hive
          Issue Type: Improvement
            Reporter: Pravin Sinha
            Assignee: Pravin Sinha


Current HA is supported only for different nameservices on Source and Destination. We need to add support of same nameservice on Source and Destination.
Local nameservice will be passed correctly to the repl command.
Remote nameservice will be a random name and corresponding configs for the same.

Example:
Clusters originally configured with ns for hdfs:
src: ns1
target : ns1

We can denote remote name with some random name, say for example: nsRemote. This is how the command will see the ns w.r.t source and target:

Repl Dump : src: ns1, target: nsRemote
Repl Load: src: nsRemote, target: ns1

Entries in the _files(for managed table data loc) will be made with nsRemote in stead of ns1(for src).
Example: hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot

Same way list of external table data locations will also be modified using nsRemote in stead of ns1(for src).

New configs can control the behavior:
*hive.repl.ha.datapath.replace.remote.nameservice = <boolean>*
*hive.repl.ha.datapath.replace.remote.nameservice.name = <string>*

Based on the above configs replacement of nameservice can be done.

This will also require that 'hive.repl.rootdir' is passed accordingly during dump and load:
Repl dump:
||Repl Operation||Repl Command||
|*Staging on source cluster*|
|Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
|Repl Load|repl load dbName into dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
|*Staging on target cluster*|
|Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
|Repl Load|repl load dbName into dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)