You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ala <gi...@git.apache.org> on 2017/01/26 18:43:08 UTC

[GitHub] spark pull request #16713: [SC-5550] Automatic killing of tasks that are pro...

GitHub user ala opened a pull request:

    https://github.com/apache/spark/pull/16713

    [SC-5550] Automatic killing of tasks that are producing too many output rows

    ## What changes were proposed in this pull request?
    
    This change implements TaskOutputListener, which continuously monitors the metric updates send gradually by tasks in execution. In particular, the number of records read, generated and produced is inspected. Tasks for which the ratio between (number of produced records) and (number or records read + generated) exceeds the threshold are canceled. This mechanism is off by default, can be turned on by modifying parameter spark.outputRatioKillThreshold.
    
    Additionally, a bunch of tests were added to check the correctness of metrics mentioned above.
    
    For the Range operator, a new metric "number of generated rows" was added.
    
    ## How was this patch tested?
    
    Unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ala/spark metrics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16713
    
----
commit 29d36b879550d03551169cd10ae2d22c5e54c1c1
Author: Ala Luszczak <al...@databricks.com>
Date:   2017-01-26T17:36:01Z

    This change implements TaskOutputListener, which continuously monitors the metric updates send gradually by tasks in execution. In particular, the number of records read, generated and produced is inspected. Tasks for which the ratio between (number of produced records) and (number or records read + generated) exceeds the threshold are canceled. This mechanism is off by default, can be turned on by modifying parameter spark.outputRatioKillThreshold.
    
    Additionally, a bunch of tests were added to check the correctness of metrics mentioned above.
    
    For the Range operator, a new metric "number of generated rows" was added.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16713: [SC-5550] Automatic killing of tasks that are producing ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16713
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16713: [SC-5550] Automatic killing of tasks that are pro...

Posted by ala <gi...@git.apache.org>.
Github user ala closed the pull request at:

    https://github.com/apache/spark/pull/16713


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16713: [SC-5550] Automatic killing of tasks that are producing ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16713
  
    What is SC-5550? That's not a Spark bug. You need to file a bug on the Spark tracker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org