You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/11/04 16:03:00 UTC

[jira] [Work logged] (HIVE-21894) Hadoop credential password storage for the Kafka Storage handler when security is SSL

     [ https://issues.apache.org/jira/browse/HIVE-21894?focusedWorklogId=338174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338174 ]

ASF GitHub Bot logged work on HIVE-21894:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Nov/19 16:02
            Start Date: 04/Nov/19 16:02
    Worklog Time Spent: 10m 
      Work Description: justinleet commented on pull request #839: HIVE-21894: Hadoop credential password storage for the Kafka Storage handler when security is SSL
URL: https://github.com/apache/hive/pull/839
 
 
   [HIVE-21894](https://issues.apache.org/jira/browse/HIVE-21894)
   
   Allows for the KafkaStorageHandler to be configured with SSL properties, where the passwords aren't in plaintext in the table configs.  
   
   This has been tested on an actual Hadoop cluster against an actual Kafka cluster, but in a pretty limited manner and primarily for the consumer side of things (full disclosure, my use case is pretty exclusively read from).  I've done some basic testing to make sure both queries that aren't spinning up jobs (e.g. simple `SELECT *` type queries) and queries that do spin up jobs (e.g. some basic `GROUP BY`) all runs to success.
   
   There's a couple things that probably need some feedback and possibly iteration.
   
   - Distribution of the key/trust stores. Kafka can only work with these stores locally, but they need to be distributed for jobs, so HDFS seems like the right place to keep them. Right now, it's an HDFS file that is being pulled via the standard HDFS APIs into `DOWNLOADED_RESOURCES_DIR`.  There are other StorageHandlers (see: [HIVE-21894](https://issues.apache.org/jira/browse/HIVE-21894?focusedCommentId=16869476&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16869476)) do some dealing with files, but they seem to do with jars and go through the `tmpjars` config (which I believe is just essentially `-libjars`).
     - Is this the right place to put the files?
     - Is this a more reasonable way to get them?
   - Right now, producer / consumer SSL configs are assumed to be the same (i.e. `hive.kafka.ssl.keystore.password` instead of `hive.kafka.consumer.ssl ...` and `hive.kafka.producer.ssl ...`
     - This could fairly easily be split out if there's a need. I'm not honestly sure how much configuring a producer and consumer separately would be used in practice.
   - Naming of the configs. If there are any particular conventions I should follow, let me know and I'll test and update.
   - Automated testing. Given the need for HDFS and Kafka, I've just added some tests that the configs end up reasonable, but we may want more and I'm not familiar enough with Hive's testing utilities to know if there are better options.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 338174)
    Remaining Estimate: 0h
            Time Spent: 10m

> Hadoop credential password storage for the Kafka Storage handler when security is SSL
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-21894
>                 URL: https://issues.apache.org/jira/browse/HIVE-21894
>             Project: Hive
>          Issue Type: Improvement
>          Components: kafka integration
>    Affects Versions: 4.0.0
>            Reporter: Kristopher Kane
>            Assignee: Kristopher Kane
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Kafka storage handler assumes that if the Hive service is configured with Kerberos then the destination Kafka cluster is also secured with the same Kerberos realm or trust of realms.  The security configuration of the Kafka client can be overwritten due to the additive operations of the Kafka client configs, but, the only way to specify SSL and the keystore/truststore user/pass is via plain text table properties. 
> This ticket proposes adding Hadoop credential security to the Kafka storage handler in support of SSL secured Kafka clusters.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)