You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by morenn520 <gi...@git.apache.org> on 2017/05/05 07:55:36 UTC

[GitHub] spark pull request #17870: [SPARK-20608] allow standby namenodes in spark.ya...

GitHub user morenn520 opened a pull request:

    https://github.com/apache/spark/pull/17870

    [SPARK-20608] allow standby namenodes in spark.yarn.access.namenodes

    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-20608
    
    ## How was this patch tested?
    
    Spark-submit script: yarn.spark.access.namenodes=hdfs://namenode01,hdfs://namenode02
    Spark Application codes:
    dataframe.write.parquet(getActiveNameNode(...) + hdfsPath)
    
    Before this patch:
    Exception in thread "main" java.lang.reflect.InvocationTargetException
    Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
    	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
    	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1691)
    	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1322)
    	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7079)
    	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:505)
    	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getDelegationToken(AuthorizationProviderProxyClientProtocol.java:637)
    	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:957)
    	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
    	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
    	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
    	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:415)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
    	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
    
    	at org.apache.hadoop.ipc.Client.call(Client.java:1411)
    	at org.apache.hadoop.ipc.Client.call(Client.java:1364)
    	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    	at com.sun.proxy.$Proxy14.getDelegationToken(Unknown Source)
    	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:901)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    	at com.sun.proxy.$Proxy15.getDelegationToken(Unknown Source)
    	at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:988)
    	at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1316)
    	at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
    	at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
    	at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2002)
    	at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:135)
    	at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:131)
    	at scala.collection.immutable.Set$Set3.foreach(Set.scala:115)
    	at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokensForNamenodes(YarnSparkHadoopUtil.scala:131)
    	at org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:701)
    	at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:730)
    	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:833)
    	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
    	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1119)
    	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
    	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
    	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    	... 7 more
    
    Apply this patch:
    Worked!!!
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/morenn520/spark SPARK-20608

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17870
    
----
commit e6f77e066c0780352e036aa19173508d02b836ef
Author: Chen Yuechen <ch...@qiyi.com>
Date:   2017-05-04T13:21:27Z

    allow standby namenodes in spark.yarn.access.namenodes
    
    Change-Id: Id0eedfbd594b24d2a3c283a9b5febdb6042c4dd1

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17870: [SPARK-20608] allow standby namenodes in spark.ya...

Posted by morenn520 <gi...@git.apache.org>.
Github user morenn520 closed the pull request at:

    https://github.com/apache/spark/pull/17870


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17870: [SPARK-20608] allow standby namenodes in spark.yarn.acce...

Posted by morenn520 <gi...@git.apache.org>.
Github user morenn520 commented on the issue:

    https://github.com/apache/spark/pull/17870
  
    @jerryshao Why not submit PR for master branch? 
    Sorry, I didnt find yarn module in master branch..If this patch can be accepted, I would spend some time to read codes in master branch. Up to now, I have applied this patch for Spark 2.0.1 and 2.1.0 in our compiled Spark.
    
    From my understanding, your patch is trying to catch exception and continue to get tokens from others FS, right? 
    Yep, that's right. I dont think it should throw RuntimeException for standby Exception. It doesnt make any bad effect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17870: [SPARK-20608] allow standby namenodes in spark.yarn.acce...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17870
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17870: [SPARK-20608] allow standby namenodes in spark.ya...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17870#discussion_r114972572
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/security/HDFSCredentialProvider.scala ---
    @@ -75,8 +84,15 @@ private[security] class HDFSCredentialProvider extends ServiceCredentialProvider
         sparkConf.get(PRINCIPAL).flatMap { renewer =>
           val creds = new Credentials()
           nnsToAccess(hadoopConf, sparkConf).foreach { dst =>
    -        val dstFs = dst.getFileSystem(hadoopConf)
    -        dstFs.addDelegationTokens(renewer, creds)
    +          try {
    +            val dstFs = dst.getFileSystem(hadoopConf)
    +            dstFs.addDelegationTokens(renewer, creds)
    +          } catch {
    +            case e: StandbyException =>
    +              logWarning(s"Namenode ${dst} is in state standby", e)
    +            case e: RemoteException =>
    +              logWarning(s"Namenode ${dst} is in state standby", e)
    +          }
    --- End diff --
    
    HADOOP-13372 implies that swift:// throws an UnknownHostException here; best to catch & log too, in case someone adds swift:// to the list of filesystems. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17870: [SPARK-20608] allow standby namenodes in spark.yarn.acce...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17870
  
    Why not submit PR for master branch?
    
    From my understanding, your patch is trying to catch exception and continue to get tokens from others FS, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17870: [SPARK-20608] allow standby namenodes in spark.yarn.acce...

Posted by morenn520 <gi...@git.apache.org>.
Github user morenn520 commented on the issue:

    https://github.com/apache/spark/pull/17870
  
    @srowen thanks. See PR in master branch: https://github.com/apache/spark/pull/17872


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17870: [SPARK-20608] allow standby namenodes in spark.yarn.acce...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17870
  
    @morenn520 https://github.com/apache/spark/tree/master/resource-managers/yarn ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17870: [SPARK-20608] allow standby namenodes in spark.yarn.acce...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17870
  
    You need to close this one @morenn520 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org