You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Dan Burkert (JIRA)" <ji...@apache.org> on 2018/01/29 21:00:00 UTC

[jira] [Commented] (KUDU-2259) kudu-spark imports authentication token into client multiple times

    [ https://issues.apache.org/jira/browse/KUDU-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344014#comment-16344014 ] 

Dan Burkert commented on KUDU-2259:
-----------------------------------

I'm not sure this is entirely fixed.  We had a big conversation about this on Slack on 1/12/2018.  I think we identified three things we need to fix to call this completely done:

 
 # Stop sending Authentication tokens from the master to clients when the channel is not confidential (i.e. encrypted).  This is what Alexey fixed in 1f346.  This does fix the immediate issue, but if I understand correctly the executors will now get new authn tokens with the {{yarn}} username, which is going to break things when authorization lands.
 # Kudu clients should stop accepting authentication tokens from the master when the channel is not confidential.  This is symmetric to #1, and is required to fix the issue for new clients/spark versions connecting to old masters.
 # If the spark driver doesn't have an authentication token (\{{--rpc-encryption=disabled}}), then we need to tell the executors which username to use, to work around the caveat in #1.  I think this can be done by modifying \{{AuthenticationCredentialsPB }} in client.proto:

{code:java}
message AuthenticationCredentialsPB {

  oneof credential {
    security.SignedTokenPB authn_token = 1;
    string plain_username = 3;
  }

  repeated bytes ca_cert_ders = 2;
}{code}

> kudu-spark imports authentication token into client multiple times
> ------------------------------------------------------------------
>
>                 Key: KUDU-2259
>                 URL: https://issues.apache.org/jira/browse/KUDU-2259
>             Project: Kudu
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.6.0
>            Reporter: Will Berkeley
>            Priority: Major
>             Fix For: 1.7.0
>
>
> kudu-spark should have one KuduContext per task, which is sent serialized from the driver with an authentication token. The KuduContext either retrieves a Kudu client from a JVM-scoped cache, or creates one and puts it in the cache, and finally imports its authentication token into the client.
> Under default configuration in an un-Kerberized cluster, the client uses the authentication token to connect to the cluster. However, if -rpc_encryption=disabled, then the client will not use the authentication token. This causes the master to issue an authentication token to the client, and the new token replaces the old token in the client.
> While there's one KuduContext per task, multiple tasks may run on the same executor. If this occurs, each KuduContext tries to import its authentication token into the client. If the client has already received a token from the master because encryption is disabled, then it's possible that the KuduContext's token and the master-issued token are for different users, since the KuduContext's token was issued on the driver to the driver's Unix user and the master-issued token is issued to the executor's user.
> An example of the exception that occurred when running spark2-shell as root:
> {noformat}
> 18/01/11 12:14:01 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, kudu-tserver-01, executor 1): java.lang.IllegalArgumentException: cannot import authentication data from a different user: old='yarn', new='root'
> 	at org.apache.kudu.client.SecurityContext.checkUserMatches(SecurityContext.java:128)
>   	at org.apache.kudu.client.SecurityContext.importAuthenticationCredentials(SecurityContext.java:138)
> 	at org.apache.kudu.client.AsyncKuduClient.importAuthenticationCredentials(AsyncKuduClient.java:677)
>   	at org.apache.kudu.spark.kudu.KuduContext.asyncClient$lzycompute(KuduContext.scala:103)
> 	at org.apache.kudu.spark.kudu.KuduContext.asyncClient(KuduContext.scala:100)
>   	at org.apache.kudu.spark.kudu.KuduContext.syncClient$lzycompute(KuduContext.scala:98)
> 	at org.apache.kudu.spark.kudu.KuduContext.syncClient(KuduContext.scala:98)
>   	at org.apache.kudu.spark.kudu.KuduRDD.compute(KuduRDD.scala:71)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   	at org.apache.spark.scheduler.Task.run(Task.scala:108)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   	at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)