You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mu Kong <ko...@gmail.com> on 2017/06/23 09:10:12 UTC

Question about standalone Spark cluster reading from Kerberosed hadoop

Hi, all!

I was trying to read from a Kerberosed hadoop cluster from a standalone
spark cluster.
Right now, I encountered some authentication issues with Kerberos:


java.io.IOException: Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
"XXXXXXXXXXXX"; destination host is: XXXXXXXXXXXXXXX;



I checked with klist, and principle/realm is correct.
I also used hdfs command line to poke HDFS from all the nodes, and it
worked.
And if I submit job using local(client) mode, the job worked fine.

I tried to put everything from hadoop/conf to spark/conf and hive/conf to
spark/conf.
Also tried edit spark/conf/spark-env.sh to add
SPARK_SUBMIT_OPTS/SPARK_MASTER_OPTS/SPARK_SLAVE_OPTS/HADOOP_CONF_DIR/HIVE_CONF_DIR,
and tried to export them in .bashrc as well.

However, I'm still experiencing the same exception.

Then I read some concerning posts about problems with
kerberosed hadoop, some post like the following one:
http://blog.stratio.com/spark-kerberos-safe-story/
, which indicates that we can not access to kerberosed hdfs using
standalone spark cluster.

I'm using spark 2.1.1, is it still the case that we can't access kerberosed
hdfs with 2.1.1?

Thanks!


Best regards,
Mu

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

Posted by Mu Kong <ko...@gmail.com>.

Thanks for your prompt responses!

@Steve

I actually put my keytabs to all the nodes already. And I used them to
kinit on each server.

But how can I make spark to use my key tab and principle when I start
cluster or submit the job? Or is there a way to let spark use ticket cache
on each node?

I tried --keytab and --principle when I submit the job, still get the same
error. I guess that's for YARN only.

On Fri, Jun 23, 2017 at 18:50 Steve Loughran <st...@hortonworks.com> wrote:

> On 23 Jun 2017, at 10:22, Saisai Shao <sa...@gmail.com> wrote:
>
> Spark running with standalone cluster manager currently doesn't support
> accessing security Hadoop. Basically the problem is that standalone mode
> Spark doesn't have the facility to distribute delegation tokens.
>
> Currently only Spark on YARN or local mode supports security Hadoop.
>
> Thanks
> Jerry
>
>
> There's possibly an ugly workaround where you ssh in to every node and log
> in direct to your kdc using a keytab you pushed out...that would eliminate
> the need for anything related to hadoop tokens. After all, that's
> essentially what spark-on-yarn does when when you give it keytab.
>
>
> see also:
> https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details
>
> On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong <ko...@gmail.com> wrote:
>
>> Hi, all!
>>
>> I was trying to read from a Kerberosed hadoop cluster from a standalone
>> spark cluster.
>> Right now, I encountered some authentication issues with Kerberos:
>>
>>
>> java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "XXXXXXXXXXXX"; destination host is: XXXXXXXXXXXXXXX;
>>
>>
>>
>> I checked with klist, and principle/realm is correct.
>> I also used hdfs command line to poke HDFS from all the nodes, and it
>> worked.
>> And if I submit job using local(client) mode, the job worked fine.
>>
>> I tried to put everything from hadoop/conf to spark/conf and hive/conf to
>> spark/conf.
>> Also tried edit spark/conf/spark-env.sh to add
>> SPARK_SUBMIT_OPTS/SPARK_MASTER_OPTS/SPARK_SLAVE_OPTS/HADOOP_CONF_DIR/HIVE_CONF_DIR,
>> and tried to export them in .bashrc as well.
>>
>> However, I'm still experiencing the same exception.
>>
>> Then I read some concerning posts about problems with
>> kerberosed hadoop, some post like the following one:
>> http://blog.stratio.com/spark-kerberos-safe-story/
>> , which indicates that we can not access to kerberosed hdfs using
>> standalone spark cluster.
>>
>> I'm using spark 2.1.1, is it still the case that we can't access
>> kerberosed hdfs with 2.1.1?
>>
>> Thanks!
>>
>>
>> Best regards,
>> Mu
>>
>>
>

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

Posted by Steve Loughran <st...@hortonworks.com>.

On 23 Jun 2017, at 10:22, Saisai Shao <sa...@gmail.com>> wrote:

Spark running with standalone cluster manager currently doesn't support accessing security Hadoop. Basically the problem is that standalone mode Spark doesn't have the facility to distribute delegation tokens.

Currently only Spark on YARN or local mode supports security Hadoop.

Thanks
Jerry

There's possibly an ugly workaround where you ssh in to every node and log in direct to your kdc using a keytab you pushed out...that would eliminate the need for anything related to hadoop tokens. After all, that's essentially what spark-on-yarn does when when you give it keytab.

see also: https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details

On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong <ko...@gmail.com>> wrote:
Hi, all!

I was trying to read from a Kerberosed hadoop cluster from a standalone spark cluster.
Right now, I encountered some authentication issues with Kerberos:

java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "XXXXXXXXXXXX"; destination host is: XXXXXXXXXXXXXXX;

I checked with klist, and principle/realm is correct.
I also used hdfs command line to poke HDFS from all the nodes, and it worked.
And if I submit job using local(client) mode, the job worked fine.

I tried to put everything from hadoop/conf to spark/conf and hive/conf to spark/conf.
Also tried edit spark/conf/spark-env.sh to add SPARK_SUBMIT_OPTS/SPARK_MASTER_OPTS/SPARK_SLAVE_OPTS/HADOOP_CONF_DIR/HIVE_CONF_DIR, and tried to export them in .bashrc as well.

However, I'm still experiencing the same exception.

Then I read some concerning posts about problems with kerberosed hadoop, some post like the following one:
http://blog.stratio.com/spark-kerberos-safe-story/
, which indicates that we can not access to kerberosed hdfs using standalone spark cluster.

I'm using spark 2.1.1, is it still the case that we can't access kerberosed hdfs with 2.1.1?

Thanks!

Best regards,
Mu

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

Posted by Saisai Shao <sa...@gmail.com>.

Spark running with standalone cluster manager currently doesn't support
accessing security Hadoop. Basically the problem is that standalone mode
Spark doesn't have the facility to distribute delegation tokens.

Currently only Spark on YARN or local mode supports security Hadoop.

Thanks
Jerry

On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong <ko...@gmail.com> wrote:

> Hi, all!
>
> I was trying to read from a Kerberosed hadoop cluster from a standalone
> spark cluster.
> Right now, I encountered some authentication issues with Kerberos:
>
>
> java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "XXXXXXXXXXXX"; destination host is: XXXXXXXXXXXXXXX;
>
>
>
> I checked with klist, and principle/realm is correct.
> I also used hdfs command line to poke HDFS from all the nodes, and it
> worked.
> And if I submit job using local(client) mode, the job worked fine.
>
> I tried to put everything from hadoop/conf to spark/conf and hive/conf to
> spark/conf.
> Also tried edit spark/conf/spark-env.sh to add SPARK_SUBMIT_OPTS/SPARK_
> MASTER_OPTS/SPARK_SLAVE_OPTS/HADOOP_CONF_DIR/HIVE_CONF_DIR, and tried to
> export them in .bashrc as well.
>
> However, I'm still experiencing the same exception.
>
> Then I read some concerning posts about problems with
> kerberosed hadoop, some post like the following one:
> http://blog.stratio.com/spark-kerberos-safe-story/
> , which indicates that we can not access to kerberosed hdfs using
> standalone spark cluster.
>
> I'm using spark 2.1.1, is it still the case that we can't access
> kerberosed hdfs with 2.1.1?
>
> Thanks!
>
>
> Best regards,
> Mu
>
>