You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by tigerquoll <ti...@outlook.com> on 2018/10/02 03:44:20 UTC

Re: [Discuss] Datasource v2 support for Kerberos

Hi Steve,
I think that passing a kerberos keytab around is one of those bad ideas that
is entirely appropriate to re-question every single time you come across it. 
It has been used already in spark when interacting with Kerberos systems
that do not support delegation tokens. Any such system will eventually stop
talking to Spark once the passed Kerberos tickets expire and are unable to
be renewed. 

It is one of those "best bad idea we have" type situations that has arisen,
been discussed to death, and finally, grudgingly, an interim-only solution
settled on as passing the keytab to the worker to renew Kerberos tickets. A
long-time notable offender in this area is secure Kafka. Thankfully Kafka
delegation tokens are soon to be supported in spark, removing the need to
pass keytabs around when interacting with Kafka.

This particular thread could probably be better renamed as Generic
Datasource v2 support for Kerberos configuration - I would like to divert
from conversation on alternate architectures that could handle a lack of
delegation tickets (it is a worthwhile conversation, but a long and involved
one that will distract from this particular narrowly defined topic), and
focus just on configuration. information.   A very quick look through
various client code has identified at least the following configuration
information that potentially could be of use to a datasource that uses
Kerberos.

* krb5ConfPath
* kerberos debugging flags
* spark.security.credentials.${service}.enabled
* JAAS config
* ZKServerPrincipal ??

It is entirely feasible that each datasource may require its own unique
Kerberos configuration (e.g. You are pulling from a external datasource that
has a different KDC then the yarn cluster you are running on).



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [Discuss] Datasource v2 support for Kerberos

Posted by Steve Loughran <st...@hortonworks.com>.


On 2 Oct 2018, at 04:44, tigerquoll <ti...@outlook.com>> wrote:

Hi Steve,
I think that passing a kerberos keytab around is one of those bad ideas that
is entirely appropriate to re-question every single time you come across it.
It has been used already in spark when interacting with Kerberos systems
that do not support delegation tokens. Any such system will eventually stop
talking to Spark once the passed Kerberos tickets expire and are unable to
be renewed.

It is one of those "best bad idea we have" type situations that has arisen,
been discussed to death, and finally, grudgingly, an interim-only solution
settled on as passing the keytab to the worker to renew Kerberos tickets.

Spark AM, generally, with it pushing out tickets to the workers,  I don't believe the workers get to see the keytab —do they?

Gabor's illustration in the kafka SPIP is probably the best illustration of it I've ever seen
https://docs.google.com/document/d/1ouRayzaJf_N5VQtGhVq9FURXVmRpXzEEWYHob0ne3NY/edit#


A
long-time notable offender in this area is secure Kafka. Thankfully Kafka
delegation tokens are soon to be supported in spark, removing the need to
pass keytabs around when interacting with Kafka.

This particular thread could probably be better renamed as Generic
Datasource v2 support for Kerberos configuration - I would like to divert
from conversation on alternate architectures that could handle a lack of
delegation tickets (it is a worthwhile conversation, but a long and involved
one that will distract from this particular narrowly defined topic), and
focus just on configuration. information.   A very quick look through
various client code has identified at least the following configuration
information that potentially could be of use to a datasource that uses
Kerberos.

* krb5ConfPath
* kerberos debugging flags

mmm. https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/secrets.html

FWIW, Hadoop 2.8+ has the KDiag entry point which can also be run inside an application —though there's always the risk that going near UGI too early can "collapse" kerberos state too early

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/KDiag.java

if Spark needs something like that for 2.7.x too, copying & repackaging that class would be a place to start


* spark.security.credentials.${service}.enabled
* JAAS config
* ZKServerPrincipal ??

It is entirely feasible that each datasource may require its own unique
Kerberos configuration (e.g. You are pulling from a external datasource that
has a different KDC then the yarn cluster you are running on).

This is a use-case I've never encountered, instead everyone relies on cross-AD trust. That's complex enough as it is