You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Pravin Sinha (Jira)" <ji...@apache.org> on 2020/09/21 21:49:00 UTC
[jira] [Created] (HIVE-24187) Handle _files creation for HA config
with same nameservice on source and destination
Pravin Sinha created HIVE-24187:
-----------------------------------
Summary: Handle _files creation for HA config with same nameservice on source and destination
Key: HIVE-24187
URL: https://issues.apache.org/jira/browse/HIVE-24187
Project: Hive
Issue Type: Improvement
Reporter: Pravin Sinha
Assignee: Pravin Sinha
Current HA is supported only for different nameservices on Source and Destination. We need to add support of same nameservice on Source and Destination.
Local nameservice will be passed correctly to the repl command.
Remote nameservice will be a random name and corresponding configs for the same.
Example:
Clusters originally configured with ns for hdfs:
src: ns1
target : ns1
We can denote remote name with some random name, say for example: nsRemote. This is how the command will see the ns w.r.t source and target:
Repl Dump : src: ns1, target: nsRemote
Repl Load: src: nsRemote, target: ns1
Entries in the _files(for managed table data loc) will be made with nsRemote in stead of ns1(for src).
Example: hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
Same way list of external table data locations will also be modified using nsRemote in stead of ns1(for src).
New configs can control the behavior:
*hive.repl.ha.datapath.replace.remote.nameservice = <boolean>*
*hive.repl.ha.datapath.replace.remote.nameservice.name = <string>*
Based on the above configs replacement of nameservice can be done.
This will also require that 'hive.repl.rootdir' is passed accordingly during dump and load:
Repl dump:
||Repl Operation||Repl Command||
|*Staging on source cluster*|
|Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
|Repl Load|repl load dbName into dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
|*Staging on target cluster*|
|Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
|Repl Load|repl load dbName into dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)