You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jacek Laskowski (JIRA)" <ji...@apache.org> on 2015/10/04 16:47:26 UTC

[jira] [Created] (SPARK-10921) Completely remove the use of SparkContext.preferredNodeLocationData

Jacek Laskowski created SPARK-10921:
---------------------------------------

             Summary: Completely remove the use of SparkContext.preferredNodeLocationData
                 Key: SPARK-10921
                 URL: https://issues.apache.org/jira/browse/SPARK-10921
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core, YARN
    Affects Versions: 1.5.1
            Reporter: Jacek Laskowski
            Priority: Minor


SPARK-8949 obsoleted the use of {{SparkContext.preferredNodeLocationData}} yet the code makes it less obvious as it says (see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L93-L96):

{code}
  // This is used only by YARN for now, but should be relevant to other cluster types (Mesos,
  // etc) too. This is typically generated from InputFormatInfo.computePreferredLocations. It
  // contains a map from hostname to a list of input format splits on the host.
  private[spark] var preferredNodeLocationData: Map[String, Set[SplitInfo]] = Map()
{code}

It turns out that there are places where the initialization does take place that only adds up to the confusion.

When you search for the use of {{SparkContext.preferredNodeLocationData}},
you'll find 3 places - one constructor marked {{@deprecated}}, the other with
{{logWarning}} telling us that _"Passing in preferred locations has no
effect at all, see SPARK-8949"_, and in
{{org.apache.spark.deploy.yarn.ApplicationMaster.registerAM}} method.

There is no consistent approach to deal with it given it's no longer used in theory.

[org.apache.spark.deploy.yarn.ApplicationMaster.registerAM|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L234-L265] method
caught my eye and I found that it does the following in
client.register:

{code}
if (sc != null) sc.preferredNodeLocationData else Map()
{code}

However, {{client.register}} [ignores the input parameter completely|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala#L47-L78], but the scaladoc says (note {{preferredNodeLocations}} param):

{code}
  /**
   * Registers the application master with the RM.
   *
   * @param conf The Yarn configuration.
   * @param sparkConf The Spark configuration.
   * @param preferredNodeLocations Map with hints about where to allocate containers.
   * @param uiAddress Address of the SparkUI.
   * @param uiHistoryAddress Address of the application on the History Server.
   */
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org