You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Oleksandr Nitavskyi <o....@criteo.com> on 2017/12/14 16:17:17 UTC

Flink long-running streaming job, Keytab authentication

Hello all,

I have a question about Kerberos authentication in Yarn environment for long running streaming job. According to the documentation ( https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/security-kerberos.html#yarnmesos-mode ) Flink’s solution is to use keytab in order to perform authentication in YARN perimeter.

If keytab is configured, Flink uses UserGroupInformation#loginUserFromKeytab method in order to perform authentication. In the YARN Security documentation (
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md#keytabs-for-am-and-containers-distributed-via-yarn ) mentioned that it should be enough:

Launched containers must themselves log in via UserGroupInformation.loginUserFromKeytab(). UGI handles the login, and schedules a background thread to relogin the user periodically.

But in reality if we check the Source code of UGI, we can see that no background Thread is created: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1153. There are just created javax.security.auth.login.LoginContext
and performed authentication. Looks like it is true for different Hadoop branches - 2.7, 2.8, 3.0, trunk. So Flink also doesn’t create any background Threads: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L69. So in my case job loses credentials for ResourceManager and HDFS after some time (12 hours in my case).

Looks like UGI’s code is not aligned with the documentation and it doesn’t relogin periodically.
But do you think patching with background Thread which performs UGI#reloginUserFromKeytab can be a solution?

P.S. We are running Flink as a single job on Yarn.



Re: Flink long-running streaming job, Keytab authentication

Posted by Eron Wright <er...@gmail.com>.
To my knowledge the various RPC clients take care of renewal (whether
reactively or using a renewal thread).  Some examples:
https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L638
https://github.com/apache/kafka/blob/0.10.2/clients/src/main/java/org/apache/kafka/common/security/kerberos/KerberosLogin.java#L139

So I don't think Flink needs a renewal thread but the overall situation is
complex.  Some stack traces and logs may be needed to understand the issue.

Eron

On Thu, Dec 14, 2017 at 8:17 AM, Oleksandr Nitavskyi <o.nitavskyi@criteo.com
> wrote:

> Hello all,
>
>
>
> I have a question about Kerberos authentication in Yarn environment for
> long running streaming job. According to the documentation (
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/security-
> kerberos.html#yarnmesos-mode ) Flink’s solution is to use keytab in order
> to perform authentication in YARN perimeter.
>
>
>
> If keytab is configured, Flink uses
> *UserGroupInformation#loginUserFromKeytab* method in order to perform
> authentication. In the YARN Security documentation (
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-
> project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/
> YarnApplicationSecurity.md#keytabs-for-am-and-containers-
> distributed-via-yarn ) mentioned that it should be enough:
>
>
>
> *Launched containers must themselves log in
> via UserGroupInformation.loginUserFromKeytab(). UGI handles the login, and
> schedules a background thread to relogin the user periodically.*
>
>
>
> But in reality if we check the Source code of UGI, we can see that no
> background Thread is created: https://github.com/apache/
> hadoop/blob/trunk/hadoop-common-project/hadoop-common/
> src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1153.
> There are just created javax.security.auth.login.LoginContext
>
> and performed authentication. Looks like it is true for different Hadoop
> branches - 2.7, 2.8, 3.0, trunk. So Flink also doesn’t create any
> background Threads: https://github.com/apache/flink/blob/master/flink-
> runtime/src/main/java/org/apache/flink/runtime/security/
> modules/HadoopModule.java#L69. So in my case job loses credentials for
> ResourceManager and HDFS after some time (12 hours in my case).
>
>
>
> Looks like UGI’s code is not aligned with the documentation and it
> doesn’t relogin periodically.
>
> But do you think patching with background Thread which performs
> UGI#reloginUserFromKeytab can be a solution?
>
>
>
> P.S. We are running Flink as a single job on Yarn.
>
>
>
>
>