You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Foster Langbein <fo...@riskfrontiers.com> on 2018/10/15 07:03:15 UTC

kerberos auth for MS SQL server jdbc driver

Has anyone gotten spark to write to SQL server using Kerberos
authentication with Microsoft's JDBC driver? I'm having limited success,
though in theory it should work.

I'm using a YARN-mode 4-node Spark 2.3.0 cluster and trying to write a
simple table to SQL Server 2016. I can get it to work if I use SQL server
credentials, however this is not an option in my application. I need to use
windows authentication - so-called integratedSecurity - and in particular I
want to use a keytab file.

The solution half works - the spark driver creates a table on SQL server -
so I'm pretty confident the Kerberos implementation/credentials etc are
setup correctly and valid. However the executors then fail to write any
data to the table with an exception: "java.security.PrivilegedActionException:
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)"

After much tracing/debugging it seems executors are behaving differently to
the spark driver and ignoring the specification to use the credentials
supplied in the keytab and instead trying to use the default spark cluster
user. I simply haven't been able to force them to use what's in the keytab
after trying many. many variations.

Very grateful if anyone has any help/suggestions/ideas on how to get this
to work.


-- 



*Dr Foster Langbein* | Chief Technology Officer | Risk Frontiers

Level 2, 100 Christie St, St Leonards, NSW, 2065


Telephone: +61 2 8459 9777

Email: foster.langbein@riskfrontiers.com | Website: www.riskfrontiers.com




*Risk Modelling | Risk Management | Resilience | Disaster Management
| Social Research Australia | New Zealand | Asia Pacific*

Re: kerberos auth for MS SQL server jdbc driver

Posted by Foster Langbein <fo...@riskfrontiers.com>.
Thanks Marcelo, that makes a lot of sense to me now. Do you know if there
are any plans to expand Kerberos auth to the executors? The current
executor behaviour is quite curious - you can see in the trace information
that it consumes the jaas conf file and keytab (indeed they're required -
it will fail with file not found if you put in something bogus) but then
goes off and tries to use the simple auth local user credentials. I hacked
in something nasty in the MS jdbc driver as an experiment - nulled out the
creds it receives - it then loads up correctly from the keytab and
everything works.

The UserGroupInformation thing seemed promising (don't mind distributing
keytab and understand --keytab will not work). I tried something
simpleminded:
val remoteUgi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(...,
...)
remoteUgi.doAs(... df.write.jdbc(..., ...) )

but this just replicated the same behaviour - works on the driver, fails on
the executors. So guessing you meant something more fine-grained when you
say "manage the Kerberos login in your code that runs in executors"? If
there's any example code you can point to I'd be grateful (couldn't find
the Kafka example).

On Tue, Oct 16, 2018 at 3:32 AM Marcelo Vanzin <va...@cloudera.com> wrote:

> Spark only does Kerberos authentication on the driver. For executors it
> currently only supports Hadoop's delegation tokens for Kerberos.
>
> To use something that does not support delegation tokens you have to
> manually manage the Kerberos login in your code that runs in executors,
> which might be tricky. It means distributing the keytab yourself (not with
> Spark's --keytab argument) and calling into the UserGroupInformation API
> directly.
>
> I don't have any examples of that, though, maybe someone does. (We have a
> similar example for Kafka on our blog somewhere, but not sure how far that
> will get you with MS SQL.)
>
>
> On Mon, Oct 15, 2018 at 12:04 AM Foster Langbein <
> foster.langbein@riskfrontiers.com> wrote:
>
>> Has anyone gotten spark to write to SQL server using Kerberos
>> authentication with Microsoft's JDBC driver? I'm having limited success,
>> though in theory it should work.
>>
>> I'm using a YARN-mode 4-node Spark 2.3.0 cluster and trying to write a
>> simple table to SQL Server 2016. I can get it to work if I use SQL server
>> credentials, however this is not an option in my application. I need to
>> use windows authentication - so-called integratedSecurity - and in
>> particular I want to use a keytab file.
>>
>> The solution half works - the spark driver creates a table on SQL server
>> - so I'm pretty confident the Kerberos implementation/credentials etc are
>> setup correctly and valid. However the executors then fail to write any
>> data to the table with an exception: "java.security.PrivilegedActionException:
>> GSSException: No valid credentials provided (Mechanism level: Failed to
>> find any Kerberos tgt)"
>>
>> After much tracing/debugging it seems executors are behaving differently
>> to the spark driver and ignoring the specification to use the credentials
>> supplied in the keytab and instead trying to use the default spark cluster
>> user. I simply haven't been able to force them to use what's in the keytab
>> after trying many. many variations.
>>
>> Very grateful if anyone has any help/suggestions/ideas on how to get this
>> to work.
>>
>>
>> --
>>
>>
>>
>> *Dr Foster Langbein* | Chief Technology Officer | Risk Frontiers
>>
>> Level 2, 100 Christie St, St Leonards, NSW, 2065
>>
>>
>> Telephone: +61 2 8459 9777
>>
>> Email: foster.langbein@riskfrontiers.com | Website: www.riskfrontiers.com
>>
>>
>>
>>
>> *Risk Modelling | Risk Management | Resilience | Disaster Management
>> | Social Research Australia | New Zealand | Asia Pacific*
>>
>>
>>
>
>
> --
> Marcelo
>

Re: kerberos auth for MS SQL server jdbc driver

Posted by Foster Langbein <fo...@riskfrontiers.com>.
Thanks Luca, seems like a neat workaround. I tried a bit to get this to
work - I'm using spark-submit but same idea should work I thought. Do you
know if this technique must use a TGT file that matches the user the spark
job executes as?
In my case I want to use a separate service account known by SQL server and
defined in the keytab - but not the same as the local user spark runs as.
I ran into similar problems as before - executors seem to ignore anything
that's not about the spark user - and actually not sure they were taking
notice of where to pick up the TGT as supplied in
spark.executorEnv.KRB5CCNAME at all.
Thanks for the ideas however.

On Tue, Oct 16, 2018 at 7:09 AM Luca Canali <Lu...@cern.ch> wrote:

> We have a case where we interact with a Kerberized service and found a
> simple workaround to distribute and make use of the driver’s Kerberos
> credential cache file in  the executors. Maybe some of the ideas there can
> be of help for this case too? Our case in on Linux though. Details:
> https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_Executors_Kerberos_HowTo.md
>
>
>
> Regards,
> Luca
>
>
>
> *From:* Marcelo Vanzin <va...@cloudera.com.INVALID>
> *Sent:* Monday, October 15, 2018 18:32
> *To:* foster.langbein@riskfrontiers.com
> *Cc:* user <us...@spark.apache.org>
> *Subject:* Re: kerberos auth for MS SQL server jdbc driver
>
>
>
> Spark only does Kerberos authentication on the driver. For executors it
> currently only supports Hadoop's delegation tokens for Kerberos.
>
>
>
> To use something that does not support delegation tokens you have to
> manually manage the Kerberos login in your code that runs in executors,
> which might be tricky. It means distributing the keytab yourself (not with
> Spark's --keytab argument) and calling into the UserGroupInformation API
> directly.
>
>
>
> I don't have any examples of that, though, maybe someone does. (We have a
> similar example for Kafka on our blog somewhere, but not sure how far that
> will get you with MS SQL.)
>
>
>
>
>
> On Mon, Oct 15, 2018 at 12:04 AM Foster Langbein <
> foster.langbein@riskfrontiers.com> wrote:
>
> Has anyone gotten spark to write to SQL server using Kerberos
> authentication with Microsoft's JDBC driver? I'm having limited success,
> though in theory it should work.
>
>
>
> I'm using a YARN-mode 4-node Spark 2.3.0 cluster and trying to write a
> simple table to SQL Server 2016. I can get it to work if I use SQL server
> credentials, however this is not an option in my application. I need to use
> windows authentication - so-called integratedSecurity - and in particular I
> want to use a keytab file.
>
>
>
> The solution half works - the spark driver creates a table on SQL server -
> so I'm pretty confident the Kerberos implementation/credentials etc are
> setup correctly and valid. However the executors then fail to write any
> data to the table with an exception:
> "java.security.PrivilegedActionException: GSSException: No valid
> credentials provided (Mechanism level: Failed to find any Kerberos tgt)"
>
>
>
> After much tracing/debugging it seems executors are behaving differently
> to the spark driver and ignoring the specification to use the credentials
> supplied in the keytab and instead trying to use the default spark cluster
> user. I simply haven't been able to force them to use what's in the keytab
> after trying many. many variations.
>
>
>
> Very grateful if anyone has any help/suggestions/ideas on how to get this
> to work.
>
>
>
>
>
> --
>
>  [image: Image removed by sender.]
>
> *Dr Foster Langbein* | Chief Technology Officer | Risk Frontiers
>
> Level 2, 100 Christie St, St Leonards, NSW, 2065
>
>
>
> Telephone: +61 2 8459 9777
>
> Email: foster.langbein@riskfrontiers.com | Website: www.riskfrontiers.com
>
>
>
>
> *Risk Modelling | Risk Management | Resilience | Disaster Management
> | Social Research Australia | New Zealand | Asia Pacific*
>
>
>
>
>
>
> --
>
> Marcelo
>


-- 



*Dr Foster Langbein* | Chief Technology Officer | Risk Frontiers

Level 2, 100 Christie St, St Leonards, NSW, 2065


Telephone: +61 2 8459 9777

Email: foster.langbein@riskfrontiers.com | Website: www.riskfrontiers.com




*Risk Modelling | Risk Management | Resilience | Disaster Management
| Social Research Australia | New Zealand | Asia Pacific*

RE: kerberos auth for MS SQL server jdbc driver

Posted by Luca Canali <Lu...@cern.ch>.
We have a case where we interact with a Kerberized service and found a simple workaround to distribute and make use of the driver’s Kerberos credential cache file in  the executors. Maybe some of the ideas there can be of help for this case too? Our case in on Linux though. Details: https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_Executors_Kerberos_HowTo.md

Regards,
Luca

From: Marcelo Vanzin <va...@cloudera.com.INVALID>
Sent: Monday, October 15, 2018 18:32
To: foster.langbein@riskfrontiers.com
Cc: user <us...@spark.apache.org>
Subject: Re: kerberos auth for MS SQL server jdbc driver

Spark only does Kerberos authentication on the driver. For executors it currently only supports Hadoop's delegation tokens for Kerberos.

To use something that does not support delegation tokens you have to manually manage the Kerberos login in your code that runs in executors, which might be tricky. It means distributing the keytab yourself (not with Spark's --keytab argument) and calling into the UserGroupInformation API directly.

I don't have any examples of that, though, maybe someone does. (We have a similar example for Kafka on our blog somewhere, but not sure how far that will get you with MS SQL.)


On Mon, Oct 15, 2018 at 12:04 AM Foster Langbein <fo...@riskfrontiers.com>> wrote:
Has anyone gotten spark to write to SQL server using Kerberos authentication with Microsoft's JDBC driver? I'm having limited success, though in theory it should work.

I'm using a YARN-mode 4-node Spark 2.3.0 cluster and trying to write a simple table to SQL Server 2016. I can get it to work if I use SQL server credentials, however this is not an option in my application. I need to use windows authentication - so-called integratedSecurity - and in particular I want to use a keytab file.

The solution half works - the spark driver creates a table on SQL server - so I'm pretty confident the Kerberos implementation/credentials etc are setup correctly and valid. However the executors then fail to write any data to the table with an exception: "java.security.PrivilegedActionException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)"

After much tracing/debugging it seems executors are behaving differently to the spark driver and ignoring the specification to use the credentials supplied in the keytab and instead trying to use the default spark cluster user. I simply haven't been able to force them to use what's in the keytab after trying many. many variations.

Very grateful if anyone has any help/suggestions/ideas on how to get this to work.


--

 [Image removed by sender.]


Dr Foster Langbein | Chief Technology Officer | Risk Frontiers

Level 2, 100 Christie St, St Leonards, NSW, 2065



Telephone: +61 2 8459 9777

Email: foster.langbein@riskfrontiers.com<ma...@riskfrontiers.com> | Website: www.riskfrontiers.com<http://www.riskfrontiers.com/>



Risk Modelling | Risk Management | Resilience | Disaster Management | Social Research
Australia | New Zealand | Asia Pacific





--
Marcelo

Re: kerberos auth for MS SQL server jdbc driver

Posted by Marcelo Vanzin <va...@cloudera.com.INVALID>.
Spark only does Kerberos authentication on the driver. For executors it
currently only supports Hadoop's delegation tokens for Kerberos.

To use something that does not support delegation tokens you have to
manually manage the Kerberos login in your code that runs in executors,
which might be tricky. It means distributing the keytab yourself (not with
Spark's --keytab argument) and calling into the UserGroupInformation API
directly.

I don't have any examples of that, though, maybe someone does. (We have a
similar example for Kafka on our blog somewhere, but not sure how far that
will get you with MS SQL.)


On Mon, Oct 15, 2018 at 12:04 AM Foster Langbein <
foster.langbein@riskfrontiers.com> wrote:

> Has anyone gotten spark to write to SQL server using Kerberos
> authentication with Microsoft's JDBC driver? I'm having limited success,
> though in theory it should work.
>
> I'm using a YARN-mode 4-node Spark 2.3.0 cluster and trying to write a
> simple table to SQL Server 2016. I can get it to work if I use SQL server
> credentials, however this is not an option in my application. I need to
> use windows authentication - so-called integratedSecurity - and in
> particular I want to use a keytab file.
>
> The solution half works - the spark driver creates a table on SQL server -
> so I'm pretty confident the Kerberos implementation/credentials etc are
> setup correctly and valid. However the executors then fail to write any
> data to the table with an exception: "java.security.PrivilegedActionException:
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)"
>
> After much tracing/debugging it seems executors are behaving differently
> to the spark driver and ignoring the specification to use the credentials
> supplied in the keytab and instead trying to use the default spark cluster
> user. I simply haven't been able to force them to use what's in the keytab
> after trying many. many variations.
>
> Very grateful if anyone has any help/suggestions/ideas on how to get this
> to work.
>
>
> --
>
>
>
> *Dr Foster Langbein* | Chief Technology Officer | Risk Frontiers
>
> Level 2, 100 Christie St, St Leonards, NSW, 2065
>
>
> Telephone: +61 2 8459 9777
>
> Email: foster.langbein@riskfrontiers.com | Website: www.riskfrontiers.com
>
>
>
>
> *Risk Modelling | Risk Management | Resilience | Disaster Management
> | Social Research Australia | New Zealand | Asia Pacific*
>
>
>


-- 
Marcelo