You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Wojciech Indyk <wo...@gmail.com> on 2016/04/08 11:01:30 UTC

Delegation of Kerberos tokens

Hello!
 TL;DR Could you explain how (and which) Kerberos tokens should be
delegated from driver to workers? Does it depend on spark mode?

I have a Hadoop cluster HDP 2.3 with Kerberos. I use spark-sql (1.6.1
compiled with hadoop 2.7.1 and hive 1.2.1) on yarn-cluster mode to
query my hive tables.
1. When I query hive table stored in HDFS everything is fine. (assume
there is no problem with my app, config and credentials setup)
2. When I try to query external table of HBase (defined in Hive using
HBaseHandler) I have a permissions problem on RPC call from
Spark-workers to HBase region server. (there is no problem to connect
HBaseMaster from driver, Zookeepers from both driver and workers)
3. When I query the HBase table by hive (beeswax) everything is ok.
(assume there is no problem with HBaseHandler)

After some time of debugging (and write some additional logging) I see
the driver has (and delegates) only:
16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for:
16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: 172.xx.xx102:8188
16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: ha-hdfs:dataocean
Which means there are only credentials for YARN and HDFS. I am curious
is it proper behavior? I see another user has similar doubt:
https://issues.apache.org/jira/browse/SPARK-12279?focusedCommentId=15067020&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15067020

Could you explain how (and which) Kerberos tokens should be delegated
from driver to workers? Does it depend on spark mode? As I saw in the
code the method obtainTokenForHBase is calling when yarn-client mode
is on, but not for yarn-cluster. Am I right? Is it ok?

--
Kind regards/ Pozdrawiam,
Wojciech Indyk
http://datacentric.pl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Delegation of Kerberos tokens

Posted by Steve Loughran <st...@hortonworks.com>.
> On 8 Apr 2016, at 10:01, Wojciech Indyk <wo...@gmail.com> wrote:
> 
> Hello!
> TL;DR Could you explain how (and which) Kerberos tokens should be
> delegated from driver to workers? Does it depend on spark mode?

Hadoop tokens, not kerberos tickets...though the original k tickets are used to acquire the tokens

the most up to date coverage of the topic in general is in fact

http://hortonworks.com/webinar/hadoop-and-kerberos-the-madness-beyond-the-gate/
https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details


> 
> I have a Hadoop cluster HDP 2.3 with Kerberos. I use spark-sql (1.6.1
> compiled with hadoop 2.7.1 and hive 1.2.1) on yarn-cluster mode to
> query my hive tables.
> 1. When I query hive table stored in HDFS everything is fine. (assume
> there is no problem with my app, config and credentials setup)
> 2. When I try to query external table of HBase (defined in Hive using
> HBaseHandler) I have a permissions problem on RPC call from
> Spark-workers to HBase region server. (there is no problem to connect
> HBaseMaster from driver, Zookeepers from both driver and workers)
> 3. When I query the HBase table by hive (beeswax) everything is ok.
> (assume there is no problem with HBaseHandler)
> 
> After some time of debugging (and write some additional logging) I see
> the driver has (and delegates) only:
> 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for:
> 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: 172.xx.xx102:8188
> 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: ha-hdfs:dataocean
> Which means there are only credentials for YARN and HDFS. I am curious
> is it proper behavior? I see another user has similar doubt:
> https://issues.apache.org/jira/browse/SPARK-12279?focusedCommentId=15067020&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15067020
> 
> Could you explain how (and which) Kerberos tokens should be delegated
> from driver to workers? Does it depend on spark mode? As I saw in the
> code the method obtainTokenForHBase is calling when yarn-client mode
> is on, but not for yarn-cluster. Am I right? Is it ok?
> 

the tokens are picked up in both cases: Spark introspects on hive and Hbase if they are in the classpath, looks at their configs, decides if tokens are needed —and asks for them if it thinks they are

They're then attached to the AM launch context, and passed down to containers after

see also https://github.com/steveloughran/spark/blob/stevel/feature/SPARK-13148-oozie/docs/running-on-yarn.md