You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by witgo <gi...@git.apache.org> on 2014/07/12 19:27:21 UTC

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

GitHub user witgo opened a pull request:

    https://github.com/apache/spark/pull/1387

    [WIP]When the executor is thrown OutOfMemoryError exception driver run garbage collection

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark taskEvent

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1387
    
----
commit 65e281230adbf8ce05255fb723e31f5015e785a9
Author: witgo <wi...@qq.com>
Date:   2014-07-02T14:58:12Z

    add TaskEventListener

commit 12ab18b513f0fdbf326267c654ae9a9369b84001
Author: witgo <wi...@qq.com>
Date:   2014-07-06T04:19:19Z

    Merge branch 'master' of https://github.com/apache/spark into taskEvent

commit a62c30a9707bf2a1c03e3b31a5911ea21ab80e53
Author: witgo <wi...@qq.com>
Date:   2014-07-07T12:23:28Z

    move runGC to ContextCleaner

commit eaf57e377169133fca23d367492b4eeb14fae77d
Author: witgo <wi...@qq.com>
Date:   2014-07-07T15:23:20Z

    add OutOfMemoryError detection

commit f56d4f775dab8f9e2e26216f51e7b6e3a23172bd
Author: witgo <wi...@qq.com>
Date:   2014-07-12T14:01:41Z

    Merge branch 'master' of https://github.com/apache/spark into taskEvent

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49070018
  
    So your whole issue is with the `FileNotFoundException` being logged? Or, aside from that, is there a user-visible side-effect, such as the wrong job status being reported, or SparkContext not being properly stopped?
    
    But I'll just repeat it: `System.gc()` is not the way to fix this. If you really want to fix that, you need to do it properly. Otherwise, I'll live with the exception (which you fix may actually not prevent).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49065703
  
    BTW, there is no `finalize()` that I can find in the Spark tree, so the only thing `System.gc()` is achieving here is freeing memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48859861
  
    QA tests have started for PR 1387. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16611/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48863186
  
    QA results for PR 1387:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16611/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49067472
  
    Yes , `System.gc()` is just advice, may not really free resources. But RDD no close method,can only be cleared by `ContextCleaner`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49072476
  
    Yes this solution is not perfect. I have been thinking about this problem.
    BTW  the  `runGC ` method run GC and make sure it actually has run.  reference https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala#L235            


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49072342
  
    Just as a thought exercise: what if you change `ContextCleaner.stop()` to interrupt the cleaning thread, wait for it to finish, and then manually clean all buffered references that haven't been cleaned yet?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: The driver perform garbage collection, when th...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49316425
  
    QA results for PR 1387:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16779/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-2595]:The driver run garbage colle...

Posted by witgo <gi...@git.apache.org>.

Github user witgo closed the pull request at:

    https://github.com/apache/spark/pull/1387


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2595]:The driver run garbage colle...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49901334
  
    @pwendell I don't think that's the goal here; the executor is already dead, so this is not targeted at freeing up executor memory through a very indirect way. This is targeted at cleanup: looking at the posted logs (also see SPARK-2491, although that's just the logs that are also posted here), there are exceptions that would be caused by code holding files open when the VM shuts down.
    
    I stand by my earlier comment that relying on the gc for this is hacky: the code that is making these references unreachable should be cleaning them up. BTW, this also means that, in my opinion, ContextCleaner.scala is similarly hacky, and should be replaced with code that does the right thing without relying on the gc. (Also, because ContextCleaner can be disabled through a conf option...) Regardless of whether one JVM implementation consistently triggers the gc when you call `System.gc()`, others might not, and that still doesn't mean everything that we'd like to be collected actually is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48819579
  
    The JVM will run GC before throwing OOM. System.gc() doesn't necessarily ever invoke GC. What is the expected additional benefit then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48826225
  
    This patch actually runs GC on the driver in response to an Executor OOM (i.e., different JVMs entirely). What is the intended use-case, however?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49061792
  
    `SparkContext.cleaner`  will clean up no reference RDD, shuffle and broadcast.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49074742
  
    I'm sorry, my English is poor. The problem now is we do not have a reliable solution to the RDD is cleared.  Close this first?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49058593
  
    If the executor is running out of memory, what good does it make to call `System.gc()` on the driver side? You're calling it in a SparkContext listener, which as far as I know only exists in the driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-2595:]The driver run garbage colle...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49540472
  
    I've talked to many JVM developers (engineers who work on the JVM) and while System.gc is advisory in the spec, it is actually a pretty reliable way of triggering GC. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49077845
  
    Ok, tomorrow or the day after tomorrow I try it on the way you said. I only tested  the default gc configuration and I will test the other. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48821124
  
    QA results for PR 1387:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16589/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49068556
  
    So that's the problem. If ContextCleaner is relying on the gc to clean up after these references, and leaving those references "uncleaned" causes problems, then ContextCleaner needs to be fixed to either explicitly clean up these references when executors fail, or find some other way of doing it. Doing it through `System.gc()` is not a fix, because it's not guaranteed to work.
    
    But I'm actually not seeing the problem here at all. From your driver log, I'm not seeing the driver fail because of the OOM. I'm seeing the driver fail because the same task failed 4 times. Which is to be expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48818646
  
    QA tests have started for PR 1387. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16589/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49069644
  
    This involves a bug  https://issues.apache.org/jira/browse/SPARK-2491 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48819541
  
    Before throwing OOM gc is expected to be run by vm.
    
    Also note that on OOM, particularly for yarn mode, we kill the executor.
    On 12-Jul-2014 10:57 pm, "Guoqiang Li" <no...@github.com> wrote:
    
    > ------------------------------
    > You can merge this Pull Request by running
    >
    >   git pull https://github.com/witgo/spark taskEvent
    >
    > Or view, comment on, or merge it at:
    >
    >   https://github.com/apache/spark/pull/1387
    > Commit Summary
    >
    >    - add TaskEventListener
    >    - Merge branch 'master' of https://github.com/apache/spark into
    >    taskEvent
    >    - move runGC to ContextCleaner
    >    - add OutOfMemoryError detection
    >    - Merge branch 'master' of https://github.com/apache/spark into
    >    taskEvent
    >
    > File Changes
    >
    >    - *M* core/src/main/scala/org/apache/spark/ContextCleaner.scala
    >    <https://github.com/apache/spark/pull/1387/files#diff-0> (19)
    >    - *M* core/src/main/scala/org/apache/spark/SparkContext.scala
    >    <https://github.com/apache/spark/pull/1387/files#diff-1> (6)
    >    - *A*
    >    core/src/main/scala/org/apache/spark/scheduler/TaskEventListener.scala
    >    <https://github.com/apache/spark/pull/1387/files#diff-2> (43)
    >
    > Patch Links:
    >
    >    - https://github.com/apache/spark/pull/1387.patch
    >    - https://github.com/apache/spark/pull/1387.diff
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1387>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-2595]:The driver run garbage colle...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-54778141
  
    @andrewor14 actually I think when an executor OOM's we leave it alive because we catch Throwable when we run tasks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: The driver perform garbage collection, when th...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49390933
  
    @witgo please create a JIRA when proposing features like this.
    
    AFIAK the feature proposal is the following: if we detect memory pressure on the executors we should try to trigger a GC on the driver so that if there happen to be RDD's that have gone out of scope on the driver side, their associated cache blocks will be cleaned up on executors and free up memory.
    
    This is a bit of a hacky solution. I think overall the right strategy here is to make Spark robust enough that it's hard or impossible to trigger OutOfMemory errors even if lots of data is being cached. That's the focus of a bunch of other ongoing work right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-2595]:The driver run garbage colle...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-54382137
  
    Sorry to revive this again, but I am still confused as to what this PR achieves after reading through the conversation. When an executor JVM dies because of an OOM, the driver will attempt to trigger a GC and trigger cleaning tasks immediately. However, the executor that OOM'ed already died, and will probably not be able to respond to cleaning requests from the driver.
    
    Is the purpose of this to prevent other executors that haven't OOM'ed yet from dying as well? For instance, say I have 5 executors, and 1 of them runs OOM first. Is the purpose of this to prevent the other 4 from dying of the same cause preemptively?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49303783
  
    QA tests have started for PR 1387. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16779/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49072778
  
    @witgo the problem is that there's no reliable way to make sure the gc has run. Have you tried with all available gcs in the Oracle vm? Have you tried with different vms? I've actually worked on a JVM where `System.gc()` was actually a no-op. In that situation your `runGC()` method might take a really long time to finish, if it ever does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49065130
  
    I'm not sure I understood your last comment, but anyway: these object had references to them at some point. What you're suggesting is that, at some point, those references stopped existing, but there's still clean up that needs to be done. And if you're doing that clean up by using the gc, it means the cleanup is happening in a finalizer, which is bad.
    
    That means there is a bug in the code that is getting rid of those references: it should be doing this cleanup.
    
    This is no different than having to call `close()` on an InputStream. Yes, you can not do it and leave it for the GC to invoke finalizers, but that's wrong. Your code should explicitly close the streams.
    
    Unless I'm misunderstanding what you're trying to achieve. But regardless of what that is, `System.gc()` is *not* the answer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1387#discussion_r14953652
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskEventListener.scala ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.scheduler
    +
    +import java.io.IOException
    +import org.apache.spark.{ExceptionFailure, ContextCleaner, Logging, SparkConf}
    +
    +private[spark] class TaskEventListener(appName: String, sparkConf: SparkConf)
    +  extends SparkListener with Logging {
    +
    +  val MAX_PROPORTION = 0.7D
    +
    +  override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    +    val SparkListenerTaskEnd(stageId, taskType, reason, taskInfo, taskMetrics) = taskEnd
    +    if (reason.isInstanceOf[ExceptionFailure]) {
    +      val ef = reason.asInstanceOf[ExceptionFailure]
    +      if ((ef.className == classOf[OutOfMemoryError].getName) || ef.className ==
    +        classOf[IOException].getName && ef.description.startsWith("No space left on device")) {
    +        ContextCleaner.runGC()
    +      }
    +    } else if ((taskMetrics != null) && (taskMetrics.jvmGCTime.toDouble /
    +      taskMetrics.executorRunTime > MAX_PROPORTION)) {
    +      // TODO: Such logic is too rough?
    --- End diff --
    
    @vanzin What are your thoughts about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49064468
  
    Explicitly clear the means to keep all the reference object, for Java programmers ,it is very unfriendly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48841019
  
    Now, `SparkContext.cleaner` without considering the executor memory usage. This will cause the spark to fail in the shortage of memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49076040
  
    In my tests,  `runGC` method is normally working in jdk7_45.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-2595]:The driver run garbage colle...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-70467444
  
    I don't think anyone was interested in championing this, so I'd like to close this issue as a wont fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48985219
  
    I agree with your point. 
    But when a memory overflow exception is thrown .Error is the Spark given:
    ```
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 9.1:0 failed 4 times, most recent failure: Exception failure in TID 969 on host 10dian73.domain.test: java.io.FileNotFoundException: /yarn/nm/usercache/spark/appcache/application_1404728465401_0070/spark-local-20140715103235-ffda/2e/merged_shuffle_4_85_0 (No such file or directory)
            java.io.FileOutputStream.open(Native Method)
            java.io.FileOutputStream.<init>(FileOutputStream.java:221)
            org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116)
            org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177)
            org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:59)
            org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:57)
            scala.collection.Iterator$class.foreach(Iterator.scala:727)
            scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
            org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:57)
            org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:147)
            org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
            org.apache.spark.scheduler.Task.run(Task.scala:51)
            org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            java.lang.Thread.run(Thread.java:744)
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49075444
  
    `runGC` method's main problem is likely to run for a long time and still didn't work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-2595:]The driver run garbage colle...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49527449
  
    QA tests have started for PR 1387. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16852/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49063007
  
    But *why*? If those need to be cleaned up and are not currently, then you need to explicitly clean then up, not go through some gc-based hack.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48958962
  
    As it has been said, there is no guarantee that `System.gc()` does anything, much less that it does it synchronously as the code (sort of) assumes. Also, see ExecutorRunnableUtil.scala:
    
        val commands = Seq(Environment.JAVA_HOME.$() + "/bin/java",
          "-server",
          // Kill if OOM is raised - leverage yarn's failure handling to cause rescheduling.
          // Not killing the task leaves various aspects of the executor and (to some extent) the jvm in
          // an inconsistent state.
          // TODO: If the OOM is not recoverable by rescheduling it on different node, then do
          // 'something' to fail job ... akin to blacklisting trackers in mapred ?
          "-XX:OnOutOfMemoryError='kill %p'") ++
    
    So even if this worked, it wouldn't do anything in Yarn mode. Also, there is no guarantee that the VM will be able to run your code after an OOM occurs.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-49076858
  
    @witgo you do know that the Oracle VM has 3 different GCs, configurable using command line arguments, right? Have you tried all of them (and combinations thereof, since you can run different gcs in the young and old gens) in different memory pressure scenarios?
    
    These kinds of questions are why you shouldn't be trying to do that. It's sort of ok to have that code in a test, especially given what the test is trying to do. The test environment is more controlled that the environment where Spark apps themselves are deployed. So you can't just take that code, put it into the main code base, and say the issue is fixed...
    
    Also, did you take a look at https://github.com/apache/spark/pull/1387#issuecomment-49072342? That might be an easier way to achieve a "best effort" cleanup without relying on the gc API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48841151
  
    @srowen [Executor.scala#L253](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L253) handle exceptions. But the memory overflow does not seem to be handled correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1387#discussion_r14953877
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskEventListener.scala ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.scheduler
    +
    +import java.io.IOException
    +import org.apache.spark.{ExceptionFailure, ContextCleaner, Logging, SparkConf}
    +
    +private[spark] class TaskEventListener(appName: String, sparkConf: SparkConf)
    +  extends SparkListener with Logging {
    +
    +  val MAX_PROPORTION = 0.7D
    +
    +  override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    +    val SparkListenerTaskEnd(stageId, taskType, reason, taskInfo, taskMetrics) = taskEnd
    +    if (reason.isInstanceOf[ExceptionFailure]) {
    +      val ef = reason.asInstanceOf[ExceptionFailure]
    +      if ((ef.className == classOf[OutOfMemoryError].getName) || ef.className ==
    +        classOf[IOException].getName && ef.description.startsWith("No space left on device")) {
    +        ContextCleaner.runGC()
    +      }
    +    } else if ((taskMetrics != null) && (taskMetrics.jvmGCTime.toDouble /
    +      taskMetrics.executorRunTime > MAX_PROPORTION)) {
    +      // TODO: Such logic is too rough?
    --- End diff --
    
    I don't know. It's still not clear to me what it is that you're trying to fix. See my question in https://github.com/apache/spark/pull/1387#issuecomment-49070018; you pasted a bunch of looks but you haven't explained what the problem is. Is it something that causes the spark job to do something it shouldn't? Or is it just an ugly log message you're trying to get rid of?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/1387#issuecomment-48985713
  
    ```
    #
    # java.lang.OutOfMemoryError: Java heap space
    # -XX:OnOutOfMemoryError="kill %p"
    #   Executing /bin/sh -c "kill 44942"...
    14/07/15 10:38:29 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
    14/07/15 10:38:29 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Connection manager future execution context-6,5,main]
    java.lang.OutOfMemoryError: Java heap space
            at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
            at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
            at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94)
            at org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176)
            at org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63)
            at org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
            at org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$sendRequest$1.applyOrElse(BlockFetcherIterator.scala:125)
            at org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$sendRequest$1.applyOrElse(BlockFetcherIterator.scala:122)
            at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:117)
            at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:115)
            at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
    14/07/15 10:38:29 WARN HadoopRDD: Exception in RecordReader.close()
    java.io.IOException: Filesystem closed
            at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
            at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:619)
            at java.io.FilterInputStream.close(FilterInputStream.java:181)
            at org.apache.hadoop.util.LineReader.close(LineReader.java:150)
            at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:243)
            at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:226)
            at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63)
            at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:197)
            at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
            at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
            at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
            at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
            at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63)
            at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:156)
            at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
            at org.apache.spark.scheduler.Task.run(Task.scala:51)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
    
    
    -----------------
    
    14/07/15 10:38:30 INFO Executor: Running task ID 969
    14/07/15 10:38:30 INFO BlockManager: Found block broadcast_0 locally
    14/07/15 10:38:30 INFO HadoopRDD: Input split: hdfs://10dian72.domain.test:8020/input/lbs/recommend/toona/rating/20140712/part-00007:0+68016537
    14/07/15 10:38:30 ERROR Executor: Exception in task ID 969
    java.io.FileNotFoundException: /yarn/nm/usercache/spark/appcache/application_1404728465401_0070/spark-local-20140715103235-ffda/2e/merged_shuffle_4_85_0 (No such file or directory)
            at java.io.FileOutputStream.open(Native Method)
            at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
            at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116)
            at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177)
            at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:59)
            at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:57)
            at scala.collection.Iterator$class.foreach(Iterator.scala:727)
            at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
            at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:57)
            at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:147)
            at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
            at org.apache.spark.scheduler.Task.run(Task.scala:51)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
    14/07/15 10:38:30 INFO Executor: java.io.FileNotFoundException (java.io.FileNotFoundException: /yarn/nm/usercache/spark/appcache/application_1404728465401_0070/spark-local-20140715103235-ffda/2e/merged_shuffle_4_85_0 (No such file or directory)}
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---