You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Jay Sen (Jira)" <ji...@apache.org> on 2020/11/02 19:38:00 UTC
[jira] [Updated] (GOBBLIN-1308) Gobblin's kerberos token management
for remote clusters
[ https://issues.apache.org/jira/browse/GOBBLIN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jay Sen updated GOBBLIN-1308:
-----------------------------
Description:
Gobblin's hadoop tokens/ key management :
Problem: Gobblin only maintains local cluster tokens when key management is enabled. and does not have capability to manage tokens for remote hadoop cluster. ( based on my conversation with many folks here, the token files can be made available externally. but that would require that external system running on cron or something )
Solution: add remote cluster token management in Gobblin. where remote clusters key can be managed same way it manages the local clusters keys.
Config looks like following
( Changes the enable.key.management config to key.management.enabled )
{code:java}
gobblin.yarn.key.management {
enabled = true
remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, ${gobblin_sync_systems.hadoop_cluster2} ]
}
// These Gobblin platform configurations can be moved to database for other use-cases, but this layout helps make the platform moduler for each connectors.
gobblin_sync_systems {
hadoop_cluster1 {
// if Hadoop config path is specified, the FileSystem will be created based on all the xml config provided here, which has all the required info.
hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
// If hadoop config path is not specified, you can still specify the speecific nodes for the specific type of tokens
namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", "hdfs://nn2.hadoop_cluster1.example.com:8020"]
kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", "kms2.hadoop_cluster1.example.com:9292" ]
}
hadoop_cluster2 {
hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", "hdfs://nn2.hadoop_cluster2.example.com:8020"]
kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", "kms2.hadoop_cluster2.example.com:9292" ]
}
}{code}
was:
Gobblin's hadoop tokens/ key management :
Problem: Gobblin only maintains local cluster tokens when key management is enabled. and does not have capability to manage tokens for remote hadoop cluster. ( based on my conversation with many folks here, the token files can be made available externally. but that would require that external system running on cron or something )
Solution: add remote cluster token management in Gobblin. where remote clusters key can be managed same way it manages the local clusters keys.
> Gobblin's kerberos token management for remote clusters
> -------------------------------------------------------
>
> Key: GOBBLIN-1308
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1308
> Project: Apache Gobblin
> Issue Type: Improvement
> Affects Versions: 0.15.0
> Reporter: Jay Sen
> Priority: Major
> Fix For: 0.16.0
>
>
> Gobblin's hadoop tokens/ key management :
> Problem: Gobblin only maintains local cluster tokens when key management is enabled. and does not have capability to manage tokens for remote hadoop cluster. ( based on my conversation with many folks here, the token files can be made available externally. but that would require that external system running on cron or something )
> Solution: add remote cluster token management in Gobblin. where remote clusters key can be managed same way it manages the local clusters keys.
>
> Config looks like following
> ( Changes the enable.key.management config to key.management.enabled )
>
> {code:java}
> gobblin.yarn.key.management {
> enabled = true
> remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, ${gobblin_sync_systems.hadoop_cluster2} ]
> }
> // These Gobblin platform configurations can be moved to database for other use-cases, but this layout helps make the platform moduler for each connectors.
> gobblin_sync_systems {
> hadoop_cluster1 {
> // if Hadoop config path is specified, the FileSystem will be created based on all the xml config provided here, which has all the required info.
> hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
> // If hadoop config path is not specified, you can still specify the speecific nodes for the specific type of tokens
> namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", "hdfs://nn2.hadoop_cluster1.example.com:8020"]
> kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", "kms2.hadoop_cluster1.example.com:9292" ]
> }
> hadoop_cluster2 {
> hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
> namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", "hdfs://nn2.hadoop_cluster2.example.com:8020"]
> kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", "kms2.hadoop_cluster2.example.com:9292" ]
> }
> }{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)