You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2018/09/07 20:41:33 UTC

[GitHub] spark issue #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_DIR as ...

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22289
  
    `spark.hadoop.*` is not a good name. That's a special prefix in Spark that modifies any Hadoop `Configuration` object that Spark instantiates. That's the easy one.
    
    The hard one is that your change doesn't seem to achieve what your PR description says. What you're doing is just uploading the contents of `spark.hadoop.config.dir` instead of `HADOOP_CONF_DIR` with your YARN app. That means a bunch of things:
    
    - the `Client` class is still using whatever Hadoop configuration is in the classpath to choose the YARN service that will actually run the app.
    - the uploaded config is actually added at the end of the classpath of the AM / executors; the RM places its own configuration before that in the classpath, so in the launched processes, you're still *not* going to be using the configuration you defined in `spark.hadoop.conf.dir`.
    - the configuration used by the `Client` class that I mention above is actually written to a separate file and also sent over to the AM / executors, and overlayed on top of the configuration (see `SparkHadoopUtil.newConfiguration`).
    
    So to actually achieve what you want to do, you'd have to fix at least two things:
    
    - `SparkHadoopUtil.newConfiguration`
    - the way `Client` creates the YARN configuration (which is [here](https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L68))
    
    Otherwise, this change isn't actually doing much that I can see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org