You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/05/02 14:09:51 UTC

[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/21216

    [SPARK-24149][YARN] Retrieve all federated namespaces tokens

    ## What changes were proposed in this pull request?
    
    Hadoop 3 introduces HDFS federation. This means that multiple namespaces are allowed on the same HDFS cluster. In Spark, we need to ask the delegation token for all the namenodes (for each namespace), otherwise accessing any other namespace different from the default one (for which we already fetch the delegation token) fails.
    
    The PR adds the automatic discovery of all the namenodes related to all the namespaces available according to the configs in hdfs-site.xml.
    
    ## How was this patch tested?
    
    manual tests in dockerized env


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-24149

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21216.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21216
    
----
commit dfdd957c15a43bb601b0ca287b7a84e6c326c4c0
Author: Marco Gaido <ma...@...>
Date:   2018-04-29T08:56:29Z

    [SPARK-24149][YARN] Retrieve all federated namespaces tokens

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90791/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90504/testReport)** for PR 21216 at commit [`74c788f`](https://github.com/apache/spark/commit/74c788f46ca0ed5c13c2b72152cb0c7f741164f3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    
    > I disagree that it is federation. It's just declaring multiple HDFS services in the same config file. 
    
    I am using the terminology which is used in the Hadoop website. The configuration I am using is the basic one suggested in the Federation Configuration, as you can see here: https://hadoop.apache.org/docs/r2.8.3/hadoop-project-dist/hadoop-hdfs/Federation.html#Federation_Configuration.
    
    What you are referring as federation is called on the Hadoop website as federation + ViewFS: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ViewFs.html#Appendix:_A_Mount_Table_Configuration_Example.
    
    Anyway, there are two possible configurations:
    
     1. What I refer as federation (without ViewFS) which without this change would fail to write to any napespace different from the default one;
     2. ViewFS enabled, where this change is not needed. With this change in this case the risk is that we handle the same thing twice.
    
    So, what about adding a check if viewfs is enabled: if so we skip the code added here; if not, we do add all the namespaces. In this way all scenarios should be covered. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3288/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    thanks for the review @jerryshao, I updated addressing the comments. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2811/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    @jerryshao @vanzin any other comment?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    @jerryshao @mridulm @vanzin may you please review this when you have time? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    @vanzin yes, in the example I am providing the hdfs URI, without using ViewFS. Actually viewFS was not even configured in that case. In my cluster test there were just two different namespaces and only the default one can be accessed without this change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90056/testReport)** for PR 21216 at commit [`dfdd957`](https://github.com/apache/spark/commit/dfdd957c15a43bb601b0ca287b7a84e6c326c4c0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90793/testReport)** for PR 21216 at commit [`4c0cd61`](https://github.com/apache/spark/commit/4c0cd61a9eb251e3cd0a9ef835a6f4bfedad9a90).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    I'm not so familiar with federated HDFS, but is it transparent to the downside applications like Spark, or Spark should know all the configured NNs? If it is transparent, then I think the token acquisition mechanism should also be transparent to Spark, Spark doesn't need to know all the configured NNs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Is this really necessary? I was under the impression the HDFS libraries would do that for you (and that's what we observed in tests, assuming our tests were correct). If you look at the code that's what seems to happen:
    
    https://github.com/apache/hadoop/blob/branch-3.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java#L615
    
    Unless Spark is somehow resolving the viewfs URI to hdfs? Or maybe you're providing the hdfs URI instead of the viewfs one?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90195/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90717/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21216


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90717/testReport)** for PR 21216 at commit [`57ad32f`](https://github.com/apache/spark/commit/57ad32fb106f5b443c4b0ce2741992ec74cacc34).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    > This is a valid configuration and it is the easiest configuration for HDFS federation. 
    
    I disagree that it is federation. It's just declaring multiple HDFS services in the same config file. Federation means multiple services with a single namespace, using mount points, and you don't have that here.
    
    That doesn't mean that you can't add this code. But at that point I'd like to see what an actual federated HDFS configuration looks like (don't remember off the top of my head), since you shouldn't be doing duplicate work. Things that are handled by viewfs should not be handled again by this code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r189240461
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -200,7 +200,31 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .getOrElse(FileSystem.get(hadoopConf))
     
    -    filesystemsToAccess + stagingFS
    +    // Add the list of available namenodes for all namespaces in HDFS federation.
    +    // If ViewFS is enabled, this is skipped as ViewFS already handles delegation tokens for its
    +    // namespaces.
    +    val hadoopFilesystems = if (stagingFS.getScheme == "viewfs") {
    +      Set.empty
    +    } else {
    +      val nameservices = hadoopConf.getTrimmedStrings("dfs.nameservices")
    +      // Retrieving the filesystem for the nameservices where HA is not enabled
    +      val filesystemsWithoutHA = nameservices.flatMap { ns =>
    +        hadoopConf.get(s"dfs.namenode.rpc-address.$ns") match {
    +          case null => None
    +          case nameNode => Some(new Path(s"hdfs://$nameNode").getFileSystem(hadoopConf))
    --- End diff --
    
    Maybe we can change to `Option(hadoopConf.get(xxxx)).map {xxx}` for simplicity.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2924/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3334/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r189255656
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -200,7 +200,27 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .getOrElse(FileSystem.get(hadoopConf))
     
    -    filesystemsToAccess + stagingFS
    +    // Add the list of available namenodes for all namespaces in HDFS federation.
    +    // If ViewFS is enabled, this is skipped as ViewFS already handles delegation tokens for its
    +    // namespaces.
    +    val hadoopFilesystems = if (stagingFS.getScheme == "viewfs") {
    +      Set.empty
    +    } else {
    +      val nameservices = hadoopConf.getTrimmedStrings("dfs.nameservices")
    +      // Retrieving the filesystem for the nameservices where HA is not enabled
    +      val filesystemsWithoutHA = nameservices.flatMap { ns =>
    +        Option(hadoopConf.get(s"dfs.namenode.rpc-address.$ns")).map(nameNode =>
    --- End diff --
    
    I think you should use `{}` if this `map` separate into two lines: 
    
    ```
    Option(xxx).map { xx =>
      foo
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r186024263
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -196,11 +196,17 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .toSet
     
    +    // add the list of available namenodes for all namespaces in HDFS federation
    +    val hadoopFilesystems = Option(hadoopConf.get("dfs.nameservices"))
    +      .toSeq.flatMap(_.split(","))
    +      .map(ns => hadoopConf.get(s"dfs.namenode.rpc-address.$ns"))
    --- End diff --
    
    if that namespace is listed in the `dfs.nameservices` config, this should exist, otherwise it is not a valid configuration. Shall we check for null in case we get an invalid config?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90056/testReport)** for PR 21216 at commit [`dfdd957`](https://github.com/apache/spark/commit/dfdd957c15a43bb601b0ca287b7a84e6c326c4c0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90056/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    @jerryshao @vanzin any more comments?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    I'm OK with the current fix, just some minor style comments.
    
    @vanzin would you please take another look? Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90504/testReport)** for PR 21216 at commit [`74c788f`](https://github.com/apache/spark/commit/74c788f46ca0ed5c13c2b72152cb0c7f741164f3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r187134476
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -200,7 +200,19 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .getOrElse(FileSystem.get(hadoopConf))
     
    -    filesystemsToAccess + stagingFS
    +    // add the list of available namenodes for all namespaces in HDFS federation
    +    // if ViewFS is enabled, this is skipped as ViewFS already handles delegation tokens
    +    // for its namespaces
    +    val hadoopFilesystems = if (stagingFS.getScheme == "viewfs") {
    +      Set.empty
    +    } else {
    +      Option(hadoopConf.get("dfs.nameservices"))
    +        .toSeq.flatMap(_.split(","))
    +        .flatMap(ns => Option(hadoopConf.get(s"dfs.namenode.rpc-address.$ns")))
    --- End diff --
    
    style is `.foo { bar => ... }`
    
    This also will not work for HA, since there's no direct "rpc-address" like this in that case, and you need to use the namespace URI.
    
    You should probably filter out the staging dir FS in that case, too, although maybe it's already taken care of (since `filesystemsToAccess` is a set).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90212/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r187134360
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -200,7 +200,19 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .getOrElse(FileSystem.get(hadoopConf))
     
    -    filesystemsToAccess + stagingFS
    +    // add the list of available namenodes for all namespaces in HDFS federation
    +    // if ViewFS is enabled, this is skipped as ViewFS already handles delegation tokens
    +    // for its namespaces
    +    val hadoopFilesystems = if (stagingFS.getScheme == "viewfs") {
    +      Set.empty
    +    } else {
    +      Option(hadoopConf.get("dfs.nameservices"))
    --- End diff --
    
    `hadoopConf.getTrimmedStrings`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2912/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r186059087
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -196,11 +196,17 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .toSet
     
    +    // add the list of available namenodes for all namespaces in HDFS federation
    +    val hadoopFilesystems = Option(hadoopConf.get("dfs.nameservices"))
    +      .toSeq.flatMap(_.split(","))
    +      .map(ns => hadoopConf.get(s"dfs.namenode.rpc-address.$ns"))
    --- End diff --
    
    I think it is good to check the nullable, at least it is no harm to the current code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90195/testReport)** for PR 21216 at commit [`aeb6219`](https://github.com/apache/spark/commit/aeb6219f929ae1997ee1823143c08f22460c88af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    > So, what about adding a check if viewfs is enabled: if so we skip the code added here
    
    That sounds ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    @vanzin thanks, I updated the PR accordingly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90791/testReport)** for PR 21216 at commit [`6e4bf1b`](https://github.com/apache/spark/commit/6e4bf1b16e98316fb130a7ef53fbbd8a1ecead13).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90793/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2911/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3135/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90212/testReport)** for PR 21216 at commit [`94048a5`](https://github.com/apache/spark/commit/94048a5efcdc5ffad0925f43b0c6e87edcf78b08).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3335/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90212/testReport)** for PR 21216 at commit [`94048a5`](https://github.com/apache/spark/commit/94048a5efcdc5ffad0925f43b0c6e87edcf78b08).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90504/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r187134147
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -200,7 +200,19 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .getOrElse(FileSystem.get(hadoopConf))
     
    -    filesystemsToAccess + stagingFS
    +    // add the list of available namenodes for all namespaces in HDFS federation
    --- End diff --
    
    nit: needs some punctuation, sentences start with capital letters.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    @vanzin I am not sure what you mean by recommended. This is a valid configuration and it is the easiest configuration for HDFS federation. It is HDFS federation without ViewFS. I expect many user to have this situation expecially with Hive tables. They can have a table in a namespace and one in another with the proper location without any need of using viewFs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Is that the recommended way of deploying federated HDFS, though? I've always seen it deployed with viewfs as the defaultFS, which automatically handles all this.
    
    The way you've deployed, it just looks like accessing two unrelated HDFS instances, not a single namespace.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90793/testReport)** for PR 21216 at commit [`4c0cd61`](https://github.com/apache/spark/commit/4c0cd61a9eb251e3cd0a9ef835a6f4bfedad9a90).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r189240503
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -200,7 +200,31 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .getOrElse(FileSystem.get(hadoopConf))
     
    -    filesystemsToAccess + stagingFS
    +    // Add the list of available namenodes for all namespaces in HDFS federation.
    +    // If ViewFS is enabled, this is skipped as ViewFS already handles delegation tokens for its
    +    // namespaces.
    +    val hadoopFilesystems = if (stagingFS.getScheme == "viewfs") {
    +      Set.empty
    +    } else {
    +      val nameservices = hadoopConf.getTrimmedStrings("dfs.nameservices")
    +      // Retrieving the filesystem for the nameservices where HA is not enabled
    +      val filesystemsWithoutHA = nameservices.flatMap { ns =>
    +        hadoopConf.get(s"dfs.namenode.rpc-address.$ns") match {
    +          case null => None
    +          case nameNode => Some(new Path(s"hdfs://$nameNode").getFileSystem(hadoopConf))
    +        }
    +      }
    +      // Retrieving the filesystem for the nameservices where HA is enabled
    +      val filesystemsWithHA = nameservices.flatMap { ns =>
    +        hadoopConf.get(s"dfs.ha.namenodes.$ns") match {
    +          case null => None
    +          case _ => Some(new Path(s"hdfs://$ns").getFileSystem(hadoopConf))
    --- End diff --
    
    Also here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r186015828
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -196,11 +196,17 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .toSet
     
    +    // add the list of available namenodes for all namespaces in HDFS federation
    +    val hadoopFilesystems = Option(hadoopConf.get("dfs.nameservices"))
    +      .toSeq.flatMap(_.split(","))
    +      .map(ns => hadoopConf.get(s"dfs.namenode.rpc-address.$ns"))
    --- End diff --
    
    Will this configuration "dfs.namenode.rpc-address.xxx" always be existed, shall we check if it is null or not?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90791/testReport)** for PR 21216 at commit [`6e4bf1b`](https://github.com/apache/spark/commit/6e4bf1b16e98316fb130a7ef53fbbd8a1ecead13).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90717/testReport)** for PR 21216 at commit [`57ad32f`](https://github.com/apache/spark/commit/57ad32fb106f5b443c4b0ce2741992ec74cacc34).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r187570029
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -200,7 +200,19 @@ object YarnSparkHadoopUtil {
           .map(new Path(_).getFileSystem(hadoopConf))
           .getOrElse(FileSystem.get(hadoopConf))
     
    -    filesystemsToAccess + stagingFS
    +    // add the list of available namenodes for all namespaces in HDFS federation
    +    // if ViewFS is enabled, this is skipped as ViewFS already handles delegation tokens
    +    // for its namespaces
    +    val hadoopFilesystems = if (stagingFS.getScheme == "viewfs") {
    +      Set.empty
    +    } else {
    +      Option(hadoopConf.get("dfs.nameservices"))
    +        .toSeq.flatMap(_.split(","))
    +        .flatMap(ns => Option(hadoopConf.get(s"dfs.namenode.rpc-address.$ns")))
    --- End diff --
    
    Yes, you are right about HA, thanks. I am working on making it working also for HA. I will update asap.
    
    > You should probably filter out the staging dir FS in that case, too, although maybe it's already taken care of (since filesystemsToAccess is a set).
    
    yes, it is already taken care since it is a set. I have also this tested in the UT I added.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21216: [SPARK-24149][YARN] Retrieve all federated namesp...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21216#discussion_r188792203
  
    --- Diff: docs/running-on-yarn.md ---
    @@ -426,8 +426,10 @@ To use a custom metrics.properties for the application master and executors, upd
     Standard Kerberos support in Spark is covered in the [Security](security.html#kerberos) page.
     
     In YARN mode, when accessing Hadoop file systems, aside from the service hosting the user's home
    --- End diff --
    
    No your fault, but this doesn't seem accurate given the code, which doesn't seem to look at the home directory at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21216: [SPARK-24149][YARN] Retrieve all federated namespaces to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21216
  
    **[Test build #90195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90195/testReport)** for PR 21216 at commit [`aeb6219`](https://github.com/apache/spark/commit/aeb6219f929ae1997ee1823143c08f22460c88af).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org