You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhpengg <gi...@git.apache.org> on 2014/05/26 15:16:58 UTC

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

GitHub user zhpengg opened a pull request:

    https://github.com/apache/spark/pull/883

    SPARK-1929 DAGScheduler suspended by local task OOM

    DAGScheduler does not handle local task OOM properly, and will wait for the job result forever.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhpengg/spark bugfix-dag-scheduler-oom

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/883.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #883
    
----
commit aa63161c0e5ee535b220dbfbb07997ff4c4f0722
Author: Zhen Peng <zh...@baidu.com>
Date:   2014-05-26T13:15:21Z

    SPARK-1929 DAGScheduler suspended by local task OOM

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44233458
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44215677
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15210/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by zhpengg <gi...@git.apache.org>.
Github user zhpengg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/883#discussion_r13061419
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -580,6 +580,13 @@ class DAGScheduler(
           case e: Exception =>
             jobResult = JobFailed(e)
             job.listener.jobFailed(e)
    +      case oom: OutOfMemoryError =>
    +        val errors: StringWriter = new StringWriter()
    --- End diff --
    
    Thanks @rxin, I have removed the redundant memory allocations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/883#discussion_r13057704
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -580,6 +580,13 @@ class DAGScheduler(
           case e: Exception =>
             jobResult = JobFailed(e)
             job.listener.jobFailed(e)
    +      case oom: OutOfMemoryError =>
    +        val errors: StringWriter = new StringWriter()
    --- End diff --
    
    When it is actually OOM, should we try to avoid allocating new objects to make sure it can recover gracefully?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44235030
  
    Added this commit: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commitdiff;h=ef690e1f69cb8e2e03bb0c43e3ccb2c54c995df7


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44213421
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44233453
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44234938
  
    Actually never mind I will just do that when I commit the change. Merging this into master. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44236738
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15219/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44213427
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44215676
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/883#discussion_r13061790
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -580,6 +580,10 @@ class DAGScheduler(
           case e: Exception =>
             jobResult = JobFailed(e)
             job.listener.jobFailed(e)
    +      case oom: OutOfMemoryError =>
    +        val exception = new SparkException("job failed for Out of memory exception", oom)
    --- End diff --
    
    Can you change the error message to "Local job aborted due to out of memory error"
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by zhpengg <gi...@git.apache.org>.
Github user zhpengg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/883#discussion_r13060720
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -580,6 +580,13 @@ class DAGScheduler(
           case e: Exception =>
             jobResult = JobFailed(e)
             job.listener.jobFailed(e)
    +      case oom: OutOfMemoryError =>
    +        val errors: StringWriter = new StringWriter()
    --- End diff --
    
    Yes, maybe trying to catch the OOM error is not a good idea, but here we can't distinguish the exception whether thrown by local task or by driver itself. And we just try to recover DAG scheduler from the previous situation. 
    Any advice would be appreciated!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44188196
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44213142
  
    Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/883#discussion_r13060747
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -580,6 +580,13 @@ class DAGScheduler(
           case e: Exception =>
             jobResult = JobFailed(e)
             job.listener.jobFailed(e)
    +      case oom: OutOfMemoryError =>
    +        val errors: StringWriter = new StringWriter()
    --- End diff --
    
    What if instead of allocating more stuff, you just put the following:
    ```scala
    val exception = new SparkException("Out of memory exception", oom)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/883


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/883#issuecomment-44236736
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---