You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by victor-wong <gi...@git.apache.org> on 2017/05/10 09:46:31 UTC

[GitHub] spark pull request #17937: Reload credentials file config when app starts wi...

GitHub user victor-wong opened a pull request:

    https://github.com/apache/spark/pull/17937

    Reload credentials file config when app starts with checkpoint file i…

    ## What changes were proposed in this pull request?
    
    Currently credentials file configuration is recovered from checkpoint file when Spark Streaming applicatioin is restarted, which will lead to some unwanted behaviors, for example:
    
    1. Submit Spark Streaming application using keytab file with checkpoint enabled in yarn-cluster mode.
    
    > spark-submit --master yarn-cluster --principal xxxx --keytab xxx ...
    
    2. Stop Spark Streaming application;
    3. Resubmit this application after a period of time (i.e. one day);
    4. Credentials file configuration recover from checkpoint file, so value of  "spark.yarn.credentials.file" points to old staging directory (i.e. hdfs://xxxx/.sparkStaging/application_xxxx/credentials-xxxx, application_xxxx is the application id of the previous application which was stopped.)
    4. When launching executor, ExecutorDelegationTokenUpdater will update credentials from credentials file immediately. As credentials file was generated one day ago (maybe older), it has already expired, so after a period of time the executor keeps failing.
    
    Some useful logs are shown below :
    
    >2017-04-27,15:08:08,098 INFO org.apache.spark.executor.CoarseGrainedExecutorBackend: Will periodically update credentials from: hdfs://xxxx/application_xxxx/credentials-xxxx
    >2017-04-27,15:08:12,519 INFO org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater: Reading new delegation tokens from hdfs://xxxx/application_1xxxx/credentials-xxxx-xx
    >2017-04-27,15:08:12,661 INFO org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater: Tokens updated from credentials file.
    ...
    >2017-04-27,15:08:48,156 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxx for xx) can't be found in cache
    
    
    
    ## How was this patch tested?
    
    manual tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/victor-wong/spark fix-credential-file

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17937.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17937
    
----
commit fac97c69b8087fda62b776384539301df0230ae2
Author: jiasheng.wang <wa...@xiaomi.com>
Date:   2017-05-10T09:35:11Z

    Reload credentials file config when app starts with checkpoint file in cluster mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17937: Reload credentials file config when app starts with chec...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17937
  
    This is already fix in https://github.com/apache/spark/pull/18230 CC @gatorsmile .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17937: Reload credentials file config when app starts with chec...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17937
  
    @victor-wong can you please update the PR title like other PRs?
    
    By seeing your description, seems the log is from old Spark version, in the latest Spark there's no `ExecutorDelegationTokenUpdater` and it has renamed to `CredentialUpdater`, also `CredentialUpdater` will not update the credential immediately at start, it is controlled by `spark.yarn.credentials.updateTime`. Can you please check if your problem still exists in latest master code, also what exception will be met?
    
    Also I would guess some more internal configurations should be excluded from checkpoint, like "spark.yarn.credentials.renewalTime", "spark.yarn.credentials.updateTime".
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17937: Reload credentials file config when app starts with chec...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17937
  
    ping @victor-wong, how it is going?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17937: Reload credentials file config when app starts with chec...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17937
  
    Besides I guess this issue only exists in yarn cluster mode, can you also verify it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17937: Reload credentials file config when app starts wi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17937


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17937: Reload credentials file config when app starts with chec...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17937
  
    We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17937: Reload credentials file config when app starts with chec...

Posted by victor-wong <gi...@git.apache.org>.
Github user victor-wong commented on the issue:

    https://github.com/apache/spark/pull/17937
  
    Comments on last PR, https://github.com/apache/spark/pull/17782.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17937: Reload credentials file config when app starts with chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17937
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org