You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2015/02/27 21:57:43 UTC

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/4818

    [SPARK-6050] [yarn] Add config option to do lax resource matching.

    Some YARN configurations return a resource structure for allocated
    containers that does not match the requested resource. That means
    Spark would always ignore those containers. So add an option to relax
    the matching of resources, so that users can still run Spark apps in
    those situations.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-6050

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4818
    
----
commit 335969297626b96ca5f2d76b297ab156dafccd0a
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-02-27T20:56:06Z

    [SPARK-6050] [yarn] Add config option to do lax resource matching.
    
    Some YARN configurations return a resource structure for allocated
    containers that does not match the requested resource. That means
    Spark would always ignore those containers. So add an option to relax
    the matching of resources, so that users can still run Spark apps in
    those situations.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76515153
  
    @sryza When cpu scheduling is enabled (ref @tgravescs comment here and in jira) it must be validated.
    Just as we validate memory and while prioritizing based on locality of the returned resource.
    
    It is a current implementation detail of YARN that it tries to ensure that response contains resource with adequate cpu and memory (or cpu scheduling is disabled) - but this could easily change in future.
    Ideally, yarn must have returned requested vCores in the response when cpu scheduling is disabled ... but that is a different issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76486637
  
      [Test build #28090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28090/consoleFull) for   PR 4818 at commit [`3359692`](https://github.com/apache/spark/commit/335969297626b96ca5f2d76b297ab156dafccd0a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `                logError("User class threw exception: " + cause.getMessage, cause)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76474129
  
      [Test build #28090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28090/consoleFull) for   PR 4818 at commit [`3359692`](https://github.com/apache/spark/commit/335969297626b96ca5f2d76b297ab156dafccd0a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25540573
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,18 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a resource configuration that doesn't match
    +    // the request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in
    +    // those situations to disable resource matching.
    +    val matchingResource =
    +        if (sparkConf.getBoolean("spark.yarn.container.matchAnyResource", false)) {
    --- End diff --
    
    Huh, no, if you want to match any you want to use the `resource` field, not the value returned by yarn (`allocatedContainer.getResource`). That means the comparison will effectively be `resource == resource` which will always be true.
    
    I can change the config name if you think that would be clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76470593
  
    @tgravescs @mridulm 
    
    Tested:
    - --executor-cores 1, no conf = passed
    - --executor-cores 2, no conf = cannot allocate resources, job waits forever
    ---executor-cores 2, new conf enabled = passed
    
    I chose to leave the default value as "false" because it seems more correct to be strict when matching, but can change it if others feel that's more appropriate. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25545356
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,18 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a resource configuration that doesn't match
    +    // the request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in
    +    // those situations to disable resource matching.
    +    val matchingResource =
    +        if (sparkConf.getBoolean("spark.yarn.container.matchAnyResource", false)) {
    --- End diff --
    
    Oh, sorry I see now.  I was reading it backwards and thinking it was matching what was actually allocated.
    
    So I guess the question is whether we want the default of this to be true so that its backwards compatible.  Otherwise the behavior changes for anyone running now that upgrades.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Relax matching of vcore co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76486649
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28090/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25540656
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,18 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a resource configuration that doesn't match
    +    // the request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in
    +    // those situations to disable resource matching.
    +    val matchingResource =
    +        if (sparkConf.getBoolean("spark.yarn.container.matchAnyResource", false)) {
    --- End diff --
    
    re: memory, can yarn really allocate a container with less resources than you asked for?
    
    The `resource` in this class is pretty much static during the Spark app's lifetime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Relax matching of vcore co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76794507
  
      [Test build #28177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28177/consoleFull) for   PR 4818 at commit [`991c803`](https://github.com/apache/spark/commit/991c80380669f03922c695005ebfdf1d0e81b85b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Relax matching of vcore co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76794530
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28177/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25540303
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,18 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a resource configuration that doesn't match
    +    // the request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in
    +    // those situations to disable resource matching.
    +    val matchingResource =
    +        if (sparkConf.getBoolean("spark.yarn.container.matchAnyResource", false)) {
    --- End diff --
    
    seems like the config is named backwards?  If I want to match any then I want to use allocatedContainer.getResource.
    
    Perhaps matchExactResource or keep the name and switch what you get.  I would expect to match any by default so its backwards compatible for now.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76556828
  
    I think having the default true would be better so that its backwards compatible.  As @sryza  mentioned YARN shouldn't really be giving you containers smaller then you requested anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76473405
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25544439
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,18 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a resource configuration that doesn't match
    +    // the request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in
    +    // those situations to disable resource matching.
    +    val matchingResource =
    +        if (sparkConf.getBoolean("spark.yarn.container.matchAnyResource", false)) {
    --- End diff --
    
    Sorry I'm still not seeing it. resource is:
    
    Resource.newInstance(executorMemory + memoryOverhead, executorCores)
    
    which is going to be executorCores passed in, which would be for instance 8 if I request 8. 
    
    That resource is then passed to amClient.getMatchingRequests which is going to find requests that have 8 vcores, which isn't what we want because the RM without cpu scheduling returns ones with 1.  
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25540529
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,18 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a resource configuration that doesn't match
    +    // the request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in
    +    // those situations to disable resource matching.
    +    val matchingResource =
    +        if (sparkConf.getBoolean("spark.yarn.container.matchAnyResource", false)) {
    --- End diff --
    
    Also as @mridulm mentioned we are basically just matching what we got back which disables all checks. Before we were checking that memory was atleast big enough.
    
      private def isResourceConstraintSatisfied(container: Container): Boolean = {
        container.getResource.getMemory >= (executorMemory + memoryOverhead)
      }
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25555091
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,19 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a virtual core count that doesn't match the
    +    // request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in those
    +    // situations to disable matching of the core count.
    +    val matchingResource =
    +      if (sparkConf.getBoolean("spark.yarn.container.disableCpuMatching", false)) {
    +        Resource.newInstance(allocatedContainer.getResource().getMemory(),
    --- End diff --
    
    Actually, why not just use `allocatedContainer.getResource`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76520133
  
    
    AFAIK this is not documented or part of the YARN interfaces/public contract : I would prefer that spark depended on defined interfaces which are reasonably stable.
    As and when YARN stabilizes and documents their interfaces; we can modify our codebase if need be - but until then let us defensive about using implementation details.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76492244
  
      [Test build #28095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28095/consoleFull) for   PR 4818 at commit [`8c9c346`](https://github.com/apache/spark/commit/8c9c346b3f8f36eff18ca4a00a353e959ca050bf).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76516622
  
    I wouldn't really agree that this is a YARN implementation detail.  This is of course somewhat subjective given that YARN doesn't really document this behavior, but, speaking with my YARN committer hat on, I'd consider a change that makes YARN return smaller containers than requested when CPU scheduling is on a break in compatibility.
    
    Not strongly opposed to adding a config, but, if we do, I think it's better for the default to optimize for ease of use over the remote possibility that a future version of YARN might change behavior in this way.
    
    Also, whatever behavior we decide on, it would be good (and should be straightforward) to add a test for it in `YarnAllocatorSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76473277
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28087/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76502508
  
    It seems reasonable to me to have the default of "false" and make a comment in the release notes. No strong feelings here though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25555088
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,19 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a virtual core count that doesn't match the
    +    // request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in those
    +    // situations to disable matching of the core count.
    +    val matchingResource =
    +      if (sparkConf.getBoolean("spark.yarn.container.disableCpuMatching", false)) {
    +        Resource.newInstance(allocatedContainer.getResource().getMemory(),
    --- End diff --
    
    Nit: take out parens for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76514683
  
    My opinion is that we should make the default true, as the vanilla YARN default of `FIFOScheduler` will run into this issue (though most vendor distributions have a better default).  There are no versions of YARN that will return containers smaller than were requested, except in this weird situation where the scheduler doesn't support CPU scheduling.  I actually think it might be better to avoid a config at all and always just avoid matching on CPU.  It's really hard to imagine any situation where it would actually benefit someone to set the config to false.  The only one I can think of is debugging incorrect behavior in YARN, and, if we care about that, it would be better to just log something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76487019
  
    Looks good to me - pending addressing Tom's comment about what the default should be.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76487231
  
    No strong opinion from me on the default. Whatever people prefer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76471465
  
    This is specific to vcores and not mem iirc.
    A solution might be to check vcores returned and modify it to what we
    requested if found to be 1 when flag is set (we loose actual men allocated,
    other info if we replace with 'resource ' all the time, no ?)
    
    On Friday, February 27, 2015, Marcelo Vanzin <no...@github.com>
    wrote:
    
    > @tgravescs <https://github.com/tgravescs> @mridulm
    > <https://github.com/mridulm>
    >
    > Tested:
    >
    >    - --executor-cores 1, no conf = passed
    >    - --executor-cores 2, no conf = cannot allocate resources, job waits
    >    forever ---executor-cores 2, new conf enabled = passed
    >
    > I chose to leave the default value as "false" because it seems more
    > correct to be strict when matching, but can change it if others feel that's
    > more appropriate.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/4818#issuecomment-76470593>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76769047
  
      [Test build #28177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28177/consoleFull) for   PR 4818 at commit [`991c803`](https://github.com/apache/spark/commit/991c80380669f03922c695005ebfdf1d0e81b85b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76481190
  
      [Test build #28095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28095/consoleFull) for   PR 4818 at commit [`8c9c346`](https://github.com/apache/spark/commit/8c9c346b3f8f36eff18ca4a00a353e959ca050bf).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76764668
  
    I'll just remove the option then, since it doesn't seem very useful to have an option to enable more strict matching given the discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76659139
  
    @vanzin okay so maybe set this to true then? I don't have any opinion, but would love to get this in as it's one of the only release blockers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Relax matching of vcore co...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76842747
  
    this looks good to me. +1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76492252
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28095/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4818#discussion_r25544599
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -290,8 +290,18 @@ private[yarn] class YarnAllocator(
           location: String,
           containersToUse: ArrayBuffer[Container],
           remaining: ArrayBuffer[Container]): Unit = {
    +    // SPARK-6050: certain Yarn configurations return a resource configuration that doesn't match
    +    // the request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in
    +    // those situations to disable resource matching.
    +    val matchingResource =
    +        if (sparkConf.getBoolean("spark.yarn.container.matchAnyResource", false)) {
    --- End diff --
    
    Take a look at the latest version to see if it's any clearer.
    
    But the gist is that `amClient.getMatchingResources()` is matching the resources you asked for (which is `resource`) against the parameter you're passing. What the option controls is whether you're passing `resource` also as the resource to match against - so basically, the exact same structure that is already in the outstanding list of requests.
    
    So I think you're reading the condition backwards. Or something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4818#issuecomment-76471641
  
    We don't really lose anything, as far as I can tell. That information is only used to make sure that the allocated containers match those that were requested, not to do any other scheduling within Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org