You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/02 17:21:14 UTC

[GitHub] [hudi] xiaozhch5 commented on a change in pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

xiaozhch5 commented on a change in pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#discussion_r817901694



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -77,6 +77,10 @@ public HiveSyncTool(HiveSyncConfig cfg, HiveConf configuration, FileSystem fs) {
     super(configuration.getAllProperties(), fs);
 
     try {
+      if (cfg.useKerberos) {
+        configuration.set("hive.metastore.sasl.enabled", "true");
+        configuration.set("hive.metastore.kerberos.principal", cfg.kerberosPrincipal);

Review comment:
       Hello, I tested the PR and could not sync HUDi to Hive3, but I managed to do so using the following parameters and modifying the configuration.
   ```java
   if (cfg.enableKerberos) {
           System.setProperty("java.security.krb5.conf", cfg.krb5Conf);
           Configuration conf = new Configuration();
           conf.set("hadoop.security.authentication", "kerberos");
           conf.set("kerberos.principal", cfg.principal);
           UserGroupInformation.setConfiguration(conf);
           UserGroupInformation.loginUserFromKeytab(cfg.keytabName, cfg.keytabFile);
           configuration.set(HiveConf.ConfVars.METASTORE_USE_THRIFT_SASL.varname, "true");
           configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname, cfg.principal);
           configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_KEYTAB_FILE.varname, cfg.keytabFile);
         }
   ```
   The following is an explanation of each parameter:
   
   * cfg.krb5Conf: The location of krb5.conf, /etc/krb5.conf, by default.
   *  cfg.principal: Hive metastore principal, such as hive/_HOST@EXAMPLE.COM
   * cfg.keytabFile: Hive MetaStore keytab used to submit tasks to host assigned
   * cfg.keytabName: Corresponds to the principal of the keytab above, such as hive/host144
   
   Before starting the flink cluster, I distribute the hive metastore keytab to the cluster of the same location, such as, /home/keydir/hive/hive.service.keytab.
   
   And afterwards, I start a flink cluster with yarn session mode using the hive metastore keytab, and submit the SQLs to the Cluster.
   
   ```sql
   CREATE TABLE sourceT (
     uuid varchar(20),
     name varchar(10),
     age int,
     ts timestamp(3),
     `partition` varchar(20)
   ) WITH (
     'connector' = 'datagen',
     'rows-per-second' = '1'
   );
   
   create table t2(
     uuid varchar(20),
     name varchar(10),
     age int,
     ts timestamp(3),
     `partition` varchar(20)
   )
   with (
     'connector' = 'hudi',
     'path' = 'hdfs://host146:8020/tmp/t2',
     'table.type' = 'MERGE_ON_READ',
     'write.bucket_assign.tasks' = '2',
     'write.tasks' = '2',
     'hive_sync.enable' = 'true',
     'hive_sync.mode' = 'hms',
     'hive_sync.metastore.uris' = 'thrift://host145:9083',
     'hive_sync.db' = 'default',
     'hive_sync.table' = 't2',
     'hive_sync.kerberos.enable' = 'true',
     'hive_sync.kerberos.krb5.conf' = '/etc/krb5.conf', 
     'hive_sync.kerberos.principal' = 'hive/_HOST@HDP.COM',
     'hive_sync.kerberos.keytab.file' = '/home/keydir/hive/hive.service.keytab', 
     'hive_sync.kerberos.keytab.name' = 'hive/host144'
   );
   
   insert into t2 select * from sourceT;
   ```
   
   The result is like below:
   
   ![image](https://user-images.githubusercontent.com/46479816/156407399-3c7ede7c-b5d5-4252-bd7b-98025d2d865f.png)
   
   ![image](https://user-images.githubusercontent.com/46479816/156408301-eecf95fa-8e1e-4e0b-994e-866cc09d3ec3.png)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org