You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by se...@apache.org on 2017/01/11 20:19:46 UTC

[8/8] flink git commit: [FLINK-5364] [security] Fix documentation setup for Kerberos

[FLINK-5364] [security] Fix documentation setup for Kerberos


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/9cbb0a9e
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/9cbb0a9e
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/9cbb0a9e

Branch: refs/heads/master
Commit: 9cbb0a9ee53a80fc2e533d3cea18df4381575951
Parents: fc3a778
Author: Stephan Ewen <se...@apache.org>
Authored: Wed Jan 11 14:32:36 2017 +0100
Committer: Stephan Ewen <se...@apache.org>
Committed: Wed Jan 11 20:17:44 2017 +0100

----------------------------------------------------------------------
 docs/internals/flink_security.md | 146 ----------------------------------
 docs/ops/security-kerberos.md    | 145 +++++++++++++++++++++++++++++++++
 2 files changed, 145 insertions(+), 146 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/9cbb0a9e/docs/internals/flink_security.md
----------------------------------------------------------------------
diff --git a/docs/internals/flink_security.md b/docs/internals/flink_security.md
deleted file mode 100644
index a83f3b9..0000000
--- a/docs/internals/flink_security.md
+++ /dev/null
@@ -1,146 +0,0 @@
----
-title:  "Flink Security"
-# Top navigation
-top-nav-group: internals
-top-nav-pos: 10
-top-nav-title: Flink Security
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-This document briefly describes how Flink security works in the context of various deployment mechanisms (Standalone, YARN, or Mesos), 
-filesystems, connectors, and state backends.
-
-## Objective
-The primary goals of the Flink Kerberos security infrastructure are:
-1. to enable secure data access for jobs within a cluster via connectors (e.g. Kafka)
-2. to authenticate to ZooKeeper (if configured to use SASL)
-3. to authenticate to Hadoop components (e.g. HDFS, HBase) 
-
-In a production deployment scenario, streaming jobs are understood to run for long periods of time (days/weeks/months) and be able to authenticate to secure 
-data sources throughout the life of the job.  Kerberos keytabs do not expire in that timeframe, unlike a Hadoop delegation token
-or ticket cache entry.
-
-The current implementation supports running Flink clusters (Job Manager/Task Manager/jobs) with either a configured keytab credential
-or with Hadoop delegation tokens.   Keep in mind that all jobs share the credential configured for a given cluster.   To use a different keytab
-for for a certain job, simply launch a separate Flink cluster with a different configuration.   Numerous Flink clusters may run side-by-side in a YARN
-or Mesos environment.
-
-## How Flink Security works
-In concept, a Flink program may use first- or third-party connectors (Kafka, HDFS, Cassandra, Flume, Kinesis etc.) necessitating arbitrary authentication methods (Kerberos, SSL/TLS, username/password, etc.).  While satisfying the security requirements for all connectors is an ongoing effort,
-Flink provides first-class support for Kerberos authentication only.  The following services and connectors are tested for Kerberos authentication:
-
-- Kafka (0.9+)
-- HDFS
-- HBase
-- ZooKeeper
-
-Note that it is possible to enable the use of Kerberos independently for each service or connector.  For example, the user may enable 
-Hadoop security without necessitating the use of Kerberos for ZooKeeper, or vice versa.    The shared element is the configuration of 
-Kerbreros credentials, which is then explicitly used by each component.
-
-The internal architecture is based on security modules (implementing `org.apache.flink.runtime.security.modules.SecurityModule`) which
-are installed at startup.  The next section describes each security module.
-
-### Hadoop Security Module
-This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a process-wide *login user* context.   The login user is
-then used for all interactions with Hadoop, including HDFS, HBase, and YARN.
-
-If Hadoop security is enabled (in `core-site.xml`), the login user will have whatever Kerberos credential is configured.  Otherwise,
-the login user conveys only the user identity of the OS account that launched the cluster.
-
-### JAAS Security Module
-This module provides a dynamic JAAS configuration to the cluster, making available the configured Kerberos credential to ZooKeeper,
-Kafka, and other such components that rely on JAAS.
-
-Note that the user may also provide a static JAAS configuration file using the mechanisms described in the [Java SE Documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html).   Static entries override any
-dynamic entries provided by this module.
-
-### ZooKeeper Security Module
-This module configures certain process-wide ZooKeeper security-related settings, namely the ZooKeeper service name (default: `zookeeper`)
-and the JAAS login context name (default: `Client`).
-
-## Security Configuration
-
-### Flink Configuration
-The user's Kerberos ticket cache (managed with `kinit`) is used automatically, based on the following configuration option:
-
-- `security.kerberos.login.use-ticket-cache`: Indicates whether to read from the user's Kerberos ticket cache (default: `true`).
-
-A Kerberos keytab can be supplied by adding below configuration elements to the Flink configuration file:
-
-- `security.kerberos.login.keytab`: Absolute path to a Kerberos keytab file that contains the user credentials.
-
-- `security.kerberos.login.principal`: Kerberos principal name associated with the keytab.
-
-These configuration options establish a cluster-wide credential to be used in a Hadoop and/or JAAS context.  Whether the credential is used in a Hadoop context is based on the Hadoop configuration (see next section).   To be used in a JAAS context, the configuration specifies which JAAS *login contexts* (or *applications*) are enabled with the following configuration option:
-
-- `security.kerberos.login.contexts`: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, `Client` to use the credentials for ZooKeeper authentication).
-
-ZooKeeper-related configuration overrides:
-
-- `zookeeper.sasl.service-name`: The Kerberos service name that the ZooKeeper cluster is configured to use (default: `zookeeper`). Facilitates mutual-authentication between the client (Flink) and server.
-
-- `zookeeper.sasl.login-context-name`: The JAAS login context name that the ZooKeeper client uses to request the login context (default: `Client`). Should match
-one of the values specified in `security.kerberos.login.contexts`.
-
-### Hadoop Configuration
-
-The Hadoop configuration is located via the `HADOOP_CONF_DIR` environment variable and by other means (see `org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils`).   The Kerberos credential (configured above) is used automatically if Hadoop security is enabled.
-
-Note that Kerberos credentials found in the ticket cache aren't transferrable to other hosts.   In this scenario, the Flink CLI acquires Hadoop
-delegation tokens (for HDFS and for HBase).
-
-## Deployment Modes
-Here is some information specific to each deployment mode.
-
-### Standalone Mode
-
-Steps to run a secure Flink cluster in standalone/cluster mode:
-1. Add security-related configuration options to the Flink configuration file (on all cluster nodes).
-2. Ensure that the keytab file exists at the path indicated by `security.kerberos.login.keytab` on all cluster nodes.
-3. Deploy Flink cluster as normal.
-
-### YARN/Mesos Mode
-
-Steps to run a secure Flink cluster in YARN/Mesos mode:
-1. Add security-related configuration options to the Flink configuration file on the client.
-2. Ensure that the keytab file exists at the path as indicated by `security.kerberos.login.keytab` on the client node.
-3. Deploy Flink cluster as normal.
-
-In YARN/Mesos mode, the keytab is automatically copied from the client to the Flink containers.
-
-For more information, see <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md">YARN security</a> documentation.
-
-#### Using `kinit` (YARN only)
-
-In YARN mode, it is possible to deploy a secure Flink cluster without a keytab, using only the ticket cache (as managed by `kinit`).
-This avoids the complexity of generating a keytab and avoids entrusting the cluster manager with it.  The main drawback is
-that the cluster is necessarily short-lived since the generated delegation tokens will expire (typically within a week).
-
-Steps to run a secure Flink cluster using `kinit`:
-1. Add security-related configuration options to the Flink configuration file on the client.
-2. Login using the `kinit` command.
-3. Deploy Flink cluster as normal.
-
-## Further Details
-### Ticket Renewal
-Each component that uses Kerberos is independently responsible for renewing the Kerberos ticket-granting-ticket (TGT).
-Hadoop, ZooKeeper, and Kafka all renew the TGT automatically when provided a keytab.  In the delegation token scenario,
-YARN itself renews the token (up to its maximum lifespan).

http://git-wip-us.apache.org/repos/asf/flink/blob/9cbb0a9e/docs/ops/security-kerberos.md
----------------------------------------------------------------------
diff --git a/docs/ops/security-kerberos.md b/docs/ops/security-kerberos.md
new file mode 100644
index 0000000..2afe760
--- /dev/null
+++ b/docs/ops/security-kerberos.md
@@ -0,0 +1,145 @@
+---
+title:  "Kerberos Authentication Setup and Configuration"
+nav-parent_id: setup
+nav-pos: 10
+nav-title: Kerberos
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document briefly describes how Flink security works in the context of various deployment mechanisms (Standalone, YARN, or Mesos), 
+filesystems, connectors, and state backends.
+
+## Objective
+The primary goals of the Flink Kerberos security infrastructure are:
+1. to enable secure data access for jobs within a cluster via connectors (e.g. Kafka)
+2. to authenticate to ZooKeeper (if configured to use SASL)
+3. to authenticate to Hadoop components (e.g. HDFS, HBase) 
+
+In a production deployment scenario, streaming jobs are understood to run for long periods of time (days/weeks/months) and be able to authenticate to secure 
+data sources throughout the life of the job.  Kerberos keytabs do not expire in that timeframe, unlike a Hadoop delegation token
+or ticket cache entry.
+
+The current implementation supports running Flink clusters (Job Manager/Task Manager/jobs) with either a configured keytab credential
+or with Hadoop delegation tokens.   Keep in mind that all jobs share the credential configured for a given cluster.   To use a different keytab
+for for a certain job, simply launch a separate Flink cluster with a different configuration.   Numerous Flink clusters may run side-by-side in a YARN
+or Mesos environment.
+
+## How Flink Security works
+In concept, a Flink program may use first- or third-party connectors (Kafka, HDFS, Cassandra, Flume, Kinesis etc.) necessitating arbitrary authentication methods (Kerberos, SSL/TLS, username/password, etc.).  While satisfying the security requirements for all connectors is an ongoing effort,
+Flink provides first-class support for Kerberos authentication only.  The following services and connectors are tested for Kerberos authentication:
+
+- Kafka (0.9+)
+- HDFS
+- HBase
+- ZooKeeper
+
+Note that it is possible to enable the use of Kerberos independently for each service or connector.  For example, the user may enable 
+Hadoop security without necessitating the use of Kerberos for ZooKeeper, or vice versa.    The shared element is the configuration of 
+Kerbreros credentials, which is then explicitly used by each component.
+
+The internal architecture is based on security modules (implementing `org.apache.flink.runtime.security.modules.SecurityModule`) which
+are installed at startup.  The next section describes each security module.
+
+### Hadoop Security Module
+This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a process-wide *login user* context.   The login user is
+then used for all interactions with Hadoop, including HDFS, HBase, and YARN.
+
+If Hadoop security is enabled (in `core-site.xml`), the login user will have whatever Kerberos credential is configured.  Otherwise,
+the login user conveys only the user identity of the OS account that launched the cluster.
+
+### JAAS Security Module
+This module provides a dynamic JAAS configuration to the cluster, making available the configured Kerberos credential to ZooKeeper,
+Kafka, and other such components that rely on JAAS.
+
+Note that the user may also provide a static JAAS configuration file using the mechanisms described in the [Java SE Documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html).   Static entries override any
+dynamic entries provided by this module.
+
+### ZooKeeper Security Module
+This module configures certain process-wide ZooKeeper security-related settings, namely the ZooKeeper service name (default: `zookeeper`)
+and the JAAS login context name (default: `Client`).
+
+## Security Configuration
+
+### Flink Configuration
+The user's Kerberos ticket cache (managed with `kinit`) is used automatically, based on the following configuration option:
+
+- `security.kerberos.login.use-ticket-cache`: Indicates whether to read from the user's Kerberos ticket cache (default: `true`).
+
+A Kerberos keytab can be supplied by adding below configuration elements to the Flink configuration file:
+
+- `security.kerberos.login.keytab`: Absolute path to a Kerberos keytab file that contains the user credentials.
+
+- `security.kerberos.login.principal`: Kerberos principal name associated with the keytab.
+
+These configuration options establish a cluster-wide credential to be used in a Hadoop and/or JAAS context.  Whether the credential is used in a Hadoop context is based on the Hadoop configuration (see next section).   To be used in a JAAS context, the configuration specifies which JAAS *login contexts* (or *applications*) are enabled with the following configuration option:
+
+- `security.kerberos.login.contexts`: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, `Client` to use the credentials for ZooKeeper authentication).
+
+ZooKeeper-related configuration overrides:
+
+- `zookeeper.sasl.service-name`: The Kerberos service name that the ZooKeeper cluster is configured to use (default: `zookeeper`). Facilitates mutual-authentication between the client (Flink) and server.
+
+- `zookeeper.sasl.login-context-name`: The JAAS login context name that the ZooKeeper client uses to request the login context (default: `Client`). Should match
+one of the values specified in `security.kerberos.login.contexts`.
+
+### Hadoop Configuration
+
+The Hadoop configuration is located via the `HADOOP_CONF_DIR` environment variable and by other means (see `org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils`).   The Kerberos credential (configured above) is used automatically if Hadoop security is enabled.
+
+Note that Kerberos credentials found in the ticket cache aren't transferrable to other hosts.   In this scenario, the Flink CLI acquires Hadoop
+delegation tokens (for HDFS and for HBase).
+
+## Deployment Modes
+Here is some information specific to each deployment mode.
+
+### Standalone Mode
+
+Steps to run a secure Flink cluster in standalone/cluster mode:
+1. Add security-related configuration options to the Flink configuration file (on all cluster nodes).
+2. Ensure that the keytab file exists at the path indicated by `security.kerberos.login.keytab` on all cluster nodes.
+3. Deploy Flink cluster as normal.
+
+### YARN/Mesos Mode
+
+Steps to run a secure Flink cluster in YARN/Mesos mode:
+1. Add security-related configuration options to the Flink configuration file on the client.
+2. Ensure that the keytab file exists at the path as indicated by `security.kerberos.login.keytab` on the client node.
+3. Deploy Flink cluster as normal.
+
+In YARN/Mesos mode, the keytab is automatically copied from the client to the Flink containers.
+
+For more information, see <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md">YARN security</a> documentation.
+
+#### Using `kinit` (YARN only)
+
+In YARN mode, it is possible to deploy a secure Flink cluster without a keytab, using only the ticket cache (as managed by `kinit`).
+This avoids the complexity of generating a keytab and avoids entrusting the cluster manager with it.  The main drawback is
+that the cluster is necessarily short-lived since the generated delegation tokens will expire (typically within a week).
+
+Steps to run a secure Flink cluster using `kinit`:
+1. Add security-related configuration options to the Flink configuration file on the client.
+2. Login using the `kinit` command.
+3. Deploy Flink cluster as normal.
+
+## Further Details
+### Ticket Renewal
+Each component that uses Kerberos is independently responsible for renewing the Kerberos ticket-granting-ticket (TGT).
+Hadoop, ZooKeeper, and Kafka all renew the TGT automatically when provided a keytab.  In the delegation token scenario,
+YARN itself renews the token (up to its maximum lifespan).