You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jelmer Kuperus (JIRA)" <ji...@apache.org> on 2018/06/19 09:26:00 UTC
[jira] [Commented] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

    [ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516856#comment-16516856 ] 

Jelmer Kuperus commented on SPARK-5158:
---------------------------------------

I ended up with the following workaround which at first glance seems to work

1. create a `.java.login.config` file in the home directory of the spark with the following contents


{noformat}
com.sun.security.jgss.krb5.initiate {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  useTicketCache="true"
  ticketCache="/tmp/krb5cc_0"
  keyTab="/path/to/my.keytab"
  principal="user@FOO.COM";
};{noformat}

2. put a krb5.conf file in /etc/krb5.conf

3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` set : 
 * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1
 * hadoop.security.authentication to kerberos
 * hadoop.security.authorization to true

4. make sure the hadoop config gets is on the classpath of spark. Eg the process should have something like this in it
{noformat}
-cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat}
 

 

> Allow for keytab-based HDFS security in Standalone mode
> -------------------------------------------------------
>
>                 Key: SPARK-5158
>                 URL: https://issues.apache.org/jira/browse/SPARK-5158
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Matthew Cheah
>            Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS clusters in standalone mode. The main reason we haven't accepted these patches have been that they rely on insecure distribution of token files from the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the Spark driver and executors independently log in and acquire credentials using a keytab. This would work for users who have a dedicated, single-tenant, Spark clusters (i.e. they are willing to have a keytab on every machine running Spark for their application). It wouldn't address all possible deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org