You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by arunmahadevan <gi...@git.apache.org> on 2017/05/16 07:20:28 UTC

[GitHub] storm pull request #2099: STORM-2501: Auto populate Hive Credentials

Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2099#discussion_r116670032
  
    --- Diff: external/storm-hive/README.md ---
    @@ -101,6 +101,80 @@ Hive Trident state also follows similar pattern to HiveBolt it takes in HiveOpti
        TridentState state = stream.partitionPersist(factory, hiveFields, new HiveUpdater(), new Fields());
      ```
        
    +   
    +      
    +##Working with Secure Hive
    +If your topology is going to interact with secure Hive, your bolts/states needs to be authenticated by Hive Server. We 
    +currently have 2 options to support this:
    +
    +### Using keytabs on all worker hosts
    +If you have distributed the keytab files for hive user on all potential worker hosts then you can use this method. You should specify a 
    +hive configs using the methods HiveOptions.withKerberosKeytab(), HiveOptions.withKerberosPrincipal() methods.
    +
    +On worker hosts the bolt/trident-state code will use the keytab file with principal provided in the config to authenticate with 
    +Hive. This method is little dangerous as you need to ensure all workers have the keytab file at the same location and you need
    +to remember this as you bring up new hosts in the cluster.
    +
    +
    +### Using Hive MetaStore delegation tokens 
    +Your administrator can configure nimbus to automatically get delegation tokens on behalf of the topology submitter user.
    +Since Hive depends on HDFS, we should also configure HDFS delegation tokens.The nimbus should be started with following configurations:
    +
    +More details about Hadoop Tokens here: https://github.com/apache/storm/blob/master/docs/storm-hive.md
    +
    +```
    +nimbus.autocredential.plugins.classes : ["org.apache.storm.hive.security.AutoHive", "org.apache.storm.hdfs.security.AutoHDFS"]
    +nimbus.credential.renewers.classes : ["org.apache.storm.hive.security.AutoHive", "org.apache.storm.hdfs.security.AutoHDFS"]
    +nimbus.credential.renewers.freq.secs : 82800 (23 hours)
    +
    +hive.keytab.file: "/path/to/keytab/on/nimbus" (This is the keytab of hive super user that can impersonate other users.)
    +hive.kerberos.principal: "superuser@EXAMPLE.com"
    +hive.metastore.uris: "thrift://server:9083"
    +
    +//hdfs configs
    +hdfs.keytab.file: "/path/to/keytab/on/nimbus" (This is the keytab of hdfs super user that can impersonate other users.)
    +hdfs.kerberos.principal: "superuser@EXAMPLE.com" 
    +```
    +
    +Your topology configuration should have:
    +
    +```
    +topology.auto-credentials :["org.apache.storm.hive.security.AutoHive", "org.apache.storm.hdfs.security.AutoHDFS"]
    +```
    +
    +If nimbus did not have the above configuration you need to add and then restart it. Ensure the hadoop configuration 
    +files (core-site.xml, hdfs-site.xml and hive-site.xml) and the storm-hive connector jar with all the dependencies is present in nimbus's classpath.
    +
    +As an alternative to adding the configuration files (core-site.xml, hdfs-site.xml and hive-site.xml) to the classpath, you could specify the configurations
    +as a part of the topology configuration. E.g. in you custom storm.yaml (or -c option while submitting the topology),
    +
    +```
    +hiveCredentialsConfigKeys : ["cluster1", "cluster2"] (the hive clusters you want to fetch the tokens from)
    +cluster1: [{"config1": "value1", "config2": "value2", ... }] (A map of config key-values specific to cluster1)
    +cluster2: [{"config1": "value1", "hive.keytab.file": "/path/to/keytab/for/cluster2/on/nimubs", "hive.kerberos.principal": "cluster2user@EXAMPLE.com", "hive.metastore.uris": "thrift://server:9083"}] (here along with other configs, we have custom keytab and principal for "cluster2" which will override the keytab/principal specified at topology level)
    +
    +hdfsCredentialsConfigKeys : ["cluster1", "cluster2"] (the hdfs clusters you want to fetch the tokens from)
    +cluster1: [{"config1": "value1", "config2": "value2", ... }] (A map of config key-values specific to cluster1)
    +cluster2: [{"config1": "value1", "hdfs.keytab.file": "/path/to/keytab/for/cluster2/on/nimubs", "hdfs.kerberos.principal": "cluster2user@EXAMPLE.com"}] (here along with other configs, we have custom keytab and principal for "cluster2" which will override the keytab/principal specified at topology level)
    --- End diff --
    
    cluster value should be a map. Take a look at hdfs, hbase docs which was fixed recently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---