You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ifilonenko <gi...@git.apache.org> on 2018/10/11 22:35:42 UTC

[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

GitHub user ifilonenko opened a pull request:

    https://github.com/apache/spark/pull/22704

    [SPARK-25681][K8S][WIP] Leverage a config to tune renewal time retrieval 

    ## What changes were proposed in this pull request?
    
    Changes to core allow for a K8S to pass a SparkConf specifying whether the `obtainDelegationTokens()` logic fetches `renewalInterval` (as in some uses cases where the DT renewal may be external to Spark i.e. K8s Cluster Mode) and whether this renewal interval calculation is done by retrieving a second DT, as YARN does. 
    
    ## How was this patch tested?
    
    - [ ] Unit Tests
    - [ ] Integration Tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ifilonenko/spark SPARK-25681

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22704
    
----
commit cba8abbaadb44ff064d348324da7922e514b9387
Author: Ilan Filonenko <if...@...>
Date:   2018-10-11T22:26:39Z

    first WIP commit

commit 6e807e169cc9113c5fcd1653e610ec473c1ff8e8
Author: Ilan Filonenko <if...@...>
Date:   2018-10-11T22:30:06Z

    modified sentence

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    **[Test build #97286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97286/testReport)** for PR 22704 at commit [`6e807e1`](https://github.com/apache/spark/commit/6e807e169cc9113c5fcd1653e610ec473c1ff8e8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3905/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3905/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    **[Test build #97286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97286/testReport)** for PR 22704 at commit [`6e807e1`](https://github.com/apache/spark/commit/6e807e169cc9113c5fcd1653e610ec473c1ff8e8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3905/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97286/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    @vanzin and @jerryshao for opinions on approach. The classes are quite difficult to unit-test with mocking libraries, maybe a refactor is necessary, but was wondering if the overall approach would still work with Yarn, and whether the introduction of a K8S specific config here, is okay. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22704#discussion_r226069637
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala ---
    @@ -49,8 +49,11 @@ private[deploy] class HadoopFSDelegationTokenProvider(fileSystems: Configuration
         val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), fsToGetTokens, creds)
     
         // Get the token renewal interval if it is not set. It will only be called once.
    -    if (tokenRenewalInterval == null) {
    -      tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
    +    // If running a Kerberos job on Kubernetes, you may specify that you wish to not
    +    // obtain the tokenRenewal interval, as the renewal service may be external.
    --- End diff --
    
    What does "as the renewal service may be external" mean?
    
    If you're in this code, there are two options:
    - you want Spark to renew tokens, in which case you need the interval.
    - you do not want Spark to renew tokens, in which case you should not give Spark neither a principal and a keytab.
    
    The principal/keytab combo is NOT a replacement for kinit. It has always been, and always will be, the way to tell Spark that you want Spark to renew tokens itself. The current k8s backend is broken in that regard.
    
    And BTW, I know what you mean when you mention an external renewal service. But again, that does not exist, and until it does, you should not do things that assume its existence.
    
    Now as for how to avoid the extra token, that's does not need a configuration at all. The extra token is needed in YARN because to know the renewal interval, you have to call `renew()` on the token, and that fails with the token created with YARN as the renewer.
    
    So to fix this:
    - check that this is running in YARN, and create the extra token
    - if it's not running on YARN, just call "renew()" on the existing token
    
    And when, and if, there is an external renewal service, a lot of this will have to change in the first place.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22704#discussion_r226391491
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala ---
    @@ -49,8 +49,11 @@ private[deploy] class HadoopFSDelegationTokenProvider(fileSystems: Configuration
         val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), fsToGetTokens, creds)
     
         // Get the token renewal interval if it is not set. It will only be called once.
    -    if (tokenRenewalInterval == null) {
    -      tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
    +    // If running a Kerberos job on Kubernetes, you may specify that you wish to not
    +    // obtain the tokenRenewal interval, as the renewal service may be external.
    --- End diff --
    
    This is going to be kind of a long reply. I've explained most of this in a long comment I left in your k8s-krb google doc. I should probably put that in a readme here since it's really easy to get confused.
    
    > an external token service that places tokens into Secrets may exist within a company organization
    
    Lots of things here.
    
    1. You're talking about a service that creates tokens and gives them to Spark. That is unrelated to the code you're touching; this code is Spark's code for creating its own tokens. In the service scenario you're talking about, this code either wouldn't run at all, or would be a no-op. (And if that doesn't happen it's a bug somewhere in Spark. That's also, e.g., how Oozie gives Spark delegation tokens for other services.)
    
    2. As I mention above, this code is about Spark creating its own tokens. If you're talking about a "renewal service" in this context, you need to define the interface between Spark and the service, so Spark can authenticate and provide the tokens it created to that service.
    
    YARN has that functionality, but it's all very specific to YARN, and the code lives in the YARN backend in Spark, not here. If k8s had that functionality natively, you could do that in the k8s backend. But if you talk about a generic external service without any code in Spark to interact with that service, that's what I keep saying doesn't exist.
    
    3. The renewal you talk about and the renewal this code talks about are not the same thing.
    
    Your renewal is the standard delegation token renewal: you create it, and renew it periodically until it reaches its lifetime. In the YARN case, the YARN RM does that renewal for you, so your tokens will remain valid up to when the application finishes or the token expires, whatever happens first.
    
    The renewal this code talks about is not that. It's meant to work around the upper bound described above. Without it, your app has a maximum lifetime which is the lifetime of the token. This code gives Spark the ability to create new tokens and distribute them to new and already running executors, so the app keeps working indefinitely.
    
    In this mode, Spark doesn't "renew" tokens in the sense you talk about, it just creates new ones once it's getting close to the renewal time.
    
    But for that to work you need a Kerberos login. Thus, you need a principal and a keytab, unless you want to login to the driver's host and kinit every time the TGT needs renewal.
    
    On the other hand, there is no reason why this code couldn't try to do what you mean by renewal. But I'm pretty sure that renewal still requires a valid kerberos login. And just like delegation tokens, TGTs expire. The service talked about in (1) above could help, but this is the only current solution (bar YARN) where Spark creates its own tokens, and also allows the application to run past the token's expiration.
    
    There are also other caveats, like certain DTs (like Hive's) not being renewable.
    
    > The design was specifically for the popular use-case in which for cluster-mode we would not send keytabs around and instead read the DT from a secret. 
    
    See above what the scenario that requires keytabs is about. It's not great to distribute keytabs, but that's the only current solution to that problem.
    
    > cases where the Driver is over-worked in running this renewal service
    
    This code runs once a day (in the case of usual service configs) and just performs a few RPCs to get new tokens and send them to executors. It's hardly a heavy weight thing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22704
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22704#discussion_r226127311
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala ---
    @@ -49,8 +49,11 @@ private[deploy] class HadoopFSDelegationTokenProvider(fileSystems: Configuration
         val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), fsToGetTokens, creds)
     
         // Get the token renewal interval if it is not set. It will only be called once.
    -    if (tokenRenewalInterval == null) {
    -      tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
    +    // If running a Kerberos job on Kubernetes, you may specify that you wish to not
    +    // obtain the tokenRenewal interval, as the renewal service may be external.
    --- End diff --
    
    > And BTW, I know what you mean when you mention an external renewal service. But again, that does not exist, and until it does, you should not do things that assume its existence.
    
    Right, so in Spark there is no token renewal service ATM, but the existence of an external token service that places tokens into Secrets may exist within a company organization, no? Whether they leverage one, provided by Spark or not. So I personally thought that such a comment is sensible. But I'll remove it based on your reasoning. 
    
    > If you're in this code, there are two options:
    
    Ah, very good point,  checking for the presence of a keytab / principal as a flag, given that
    
    > It has always been, and always will be, the way to tell Spark that you want Spark to renew tokens itself
    
    makes sense
    
    > The current k8s backend is broken in that regard.
    
    The design was specifically for the popular use-case in which for cluster-mode we would not send keytabs around and instead read the DT from a secret. So true, it does break the traditional design because we are using a keytab and not enabling renewal. Contractually with your work, next steps would be to parallel the other resource-managers in allowing for the option to use the renewal code if the keytab and principal is already in the Driver. Just for interest, has there been any cases where the Driver is over-worked in running this renewal service and managing the DTs for many executors? In essence, could there be any convincing use-case to have the Driver use the keytab for login, but not want to do its own renewal as it might be the case that it can't handle the load?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org