You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com> on 2018/07/20 15:06:16 UTC

Query on Spark Hive with kerberos Enabled on Kubernetes

Hi All,
I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit's fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya

RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.

Hi Sandeep,
Any inputs on this?

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Saturday, July 21, 2018 6:50 PM
To: Sandeep Katta <sa...@gmail.com>
Cc: dev@spark.apache.org; user@spark.apache.org
Subject: RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Hi Sandeep,
Thx for the response:
I am using following commands: (xml files hive-site.xml, core-site.xml and hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR option).

For HDFS Access which succeeds:
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/wordcount.py hdfs://<HDFS_IP>:8020/tmp/wordcount.txt


For Hive Access (this is failing):
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, ,../examples/src/main/resources/kv1.txt --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/sql/hive.py

Following is the error:
2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with URI thrift://vm-10-75-145-54:9083
2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)

If I don’t provide the krb5.conf in the above spark-submit:
I get an error saying unable to find any default realm.

One work around I had found, if I generate any tgt by doing the kinit and copy it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess this should not be the way to do it. It should generate automatically the tgt and should access the hive metastore. Please let me know, if doing wrong.

Regards
Surya

From: Sandeep Katta [mailto:sandeep0102.opensource@gmail.com]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>>
Cc: dev@spark.apache.org<ma...@spark.apache.org>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi All,
I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya

RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.

Hi Sandeep,
Any inputs on this?

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Saturday, July 21, 2018 6:50 PM
To: Sandeep Katta <sa...@gmail.com>
Cc: dev@spark.apache.org; user@spark.apache.org
Subject: RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Hi Sandeep,
Thx for the response:
I am using following commands: (xml files hive-site.xml, core-site.xml and hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR option).

For HDFS Access which succeeds:
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/wordcount.py hdfs://<HDFS_IP>:8020/tmp/wordcount.txt


For Hive Access (this is failing):
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, ,../examples/src/main/resources/kv1.txt --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/sql/hive.py

Following is the error:
2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with URI thrift://vm-10-75-145-54:9083
2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)

If I don’t provide the krb5.conf in the above spark-submit:
I get an error saying unable to find any default realm.

One work around I had found, if I generate any tgt by doing the kinit and copy it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess this should not be the way to do it. It should generate automatically the tgt and should access the hive metastore. Please let me know, if doing wrong.

Regards
Surya

From: Sandeep Katta [mailto:sandeep0102.opensource@gmail.com]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>>
Cc: dev@spark.apache.org<ma...@spark.apache.org>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi All,
I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya

RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.

Hi Sandeep,
Thx for the response:
I am using following commands: (xml files hive-site.xml, core-site.xml and hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR option).

For HDFS Access which succeeds:
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/wordcount.py hdfs://<HDFS_IP>:8020/tmp/wordcount.txt


For Hive Access (this is failing):
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, ,../examples/src/main/resources/kv1.txt --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/sql/hive.py

Following is the error:
2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with URI thrift://vm-10-75-145-54:9083
2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)

If I don’t provide the krb5.conf in the above spark-submit:
I get an error saying unable to find any default realm.

One work around I had found, if I generate any tgt by doing the kinit and copy it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess this should not be the way to do it. It should generate automatically the tgt and should access the hive metastore. Please let me know, if doing wrong.

Regards
Surya

From: Sandeep Katta [mailto:sandeep0102.opensource@gmail.com]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>
Cc: dev@spark.apache.org; user@spark.apache.org
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi All,
I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya

RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.

Hi Sandeep,
Thx for the response:
I am using following commands: (xml files hive-site.xml, core-site.xml and hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR option).

For HDFS Access which succeeds:
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/wordcount.py hdfs://<HDFS_IP>:8020/tmp/wordcount.txt


For Hive Access (this is failing):
./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, ,../examples/src/main/resources/kv1.txt --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/sql/hive.py

Following is the error:
2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with URI thrift://vm-10-75-145-54:9083
2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)

If I don’t provide the krb5.conf in the above spark-submit:
I get an error saying unable to find any default realm.

One work around I had found, if I generate any tgt by doing the kinit and copy it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess this should not be the way to do it. It should generate automatically the tgt and should access the hive metastore. Please let me know, if doing wrong.

Regards
Surya

From: Sandeep Katta [mailto:sandeep0102.opensource@gmail.com]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>
Cc: dev@spark.apache.org; user@spark.apache.org
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi All,
I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

Thanks.

Regards
Surya

Re: Query on Spark Hive with kerberos Enabled on Kubernetes

Posted by Sandeep Katta <sa...@gmail.com>.

Can you please tell us what exception you ve got,any logs for the same ?

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia -
IN/Bangalore) <su...@nokia.com> wrote:

> Hi All,
>
> I am trying to use Spark 2.2.0 Kubernetes(
> https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0)
> code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail
> for the Hive Queries, but pass when I am trying to access the hdfs. Is this
> a known limitation or am I doing something wrong. Please let me know. If
> this is working, can you please specify an example for running Hive
> Queries?
>
>
>
> Thanks.
>
>
>
> Regards
>
> Surya
>