You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhpengg <gi...@git.apache.org> on 2014/05/26 15:16:58 UTC
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
GitHub user zhpengg opened a pull request:
https://github.com/apache/spark/pull/883
SPARK-1929 DAGScheduler suspended by local task OOM
DAGScheduler does not handle local task OOM properly, and will wait for the job result forever.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhpengg/spark bugfix-dag-scheduler-oom
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/883.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #883
----
commit aa63161c0e5ee535b220dbfbb07997ff4c4f0722
Author: Zhen Peng <zh...@baidu.com>
Date: 2014-05-26T13:15:21Z
SPARK-1929 DAGScheduler suspended by local task OOM
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44233458
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44215677
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15210/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by zhpengg <gi...@git.apache.org>.
Github user zhpengg commented on a diff in the pull request:
https://github.com/apache/spark/pull/883#discussion_r13061419
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -580,6 +580,13 @@ class DAGScheduler(
case e: Exception =>
jobResult = JobFailed(e)
job.listener.jobFailed(e)
+ case oom: OutOfMemoryError =>
+ val errors: StringWriter = new StringWriter()
--- End diff --
Thanks @rxin, I have removed the redundant memory allocations.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/883#discussion_r13057704
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -580,6 +580,13 @@ class DAGScheduler(
case e: Exception =>
jobResult = JobFailed(e)
job.listener.jobFailed(e)
+ case oom: OutOfMemoryError =>
+ val errors: StringWriter = new StringWriter()
--- End diff --
When it is actually OOM, should we try to avoid allocating new objects to make sure it can recover gracefully?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44235030
Added this commit: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commitdiff;h=ef690e1f69cb8e2e03bb0c43e3ccb2c54c995df7
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44213421
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44233453
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44234938
Actually never mind I will just do that when I commit the change. Merging this into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44236738
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15219/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44213427
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44215676
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/883#discussion_r13061790
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -580,6 +580,10 @@ class DAGScheduler(
case e: Exception =>
jobResult = JobFailed(e)
job.listener.jobFailed(e)
+ case oom: OutOfMemoryError =>
+ val exception = new SparkException("job failed for Out of memory exception", oom)
--- End diff --
Can you change the error message to "Local job aborted due to out of memory error"
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by zhpengg <gi...@git.apache.org>.
Github user zhpengg commented on a diff in the pull request:
https://github.com/apache/spark/pull/883#discussion_r13060720
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -580,6 +580,13 @@ class DAGScheduler(
case e: Exception =>
jobResult = JobFailed(e)
job.listener.jobFailed(e)
+ case oom: OutOfMemoryError =>
+ val errors: StringWriter = new StringWriter()
--- End diff --
Yes, maybe trying to catch the OOM error is not a good idea, but here we can't distinguish the exception whether thrown by local task or by driver itself. And we just try to recover DAG scheduler from the previous situation.
Any advice would be appreciated!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44188196
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44213142
Jenkins, add to whitelist.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/883#discussion_r13060747
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -580,6 +580,13 @@ class DAGScheduler(
case e: Exception =>
jobResult = JobFailed(e)
job.listener.jobFailed(e)
+ case oom: OutOfMemoryError =>
+ val errors: StringWriter = new StringWriter()
--- End diff --
What if instead of allocating more stuff, you just put the following:
```scala
val exception = new SparkException("Out of memory exception", oom)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/883
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1929 DAGScheduler suspended by local tas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/883#issuecomment-44236736
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---