You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by witgo <gi...@git.apache.org> on 2014/05/07 07:13:20 UTC

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

GitHub user witgo opened a pull request:

    https://github.com/apache/spark/pull/677

    SPARK-1712: TaskDescription instance is too big causes Spark to hang

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark SPARK-1712

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #677
    
----
commit e6578400ce58104d2b022f62110ac83f82a92872
Author: witgo <wi...@qq.com>
Date:   2014-05-07T05:12:34Z

    SPARK-1712: TaskDescription instance is too big causes Spark to hang

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12363827
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    +            val msg = "Serialized task %s:%d were %d bytes which " +
    --- End diff --
    
    It would be nicer to use string interpolation here to make it easier to parse.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12364626
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    --- End diff --
    
    @kayousterhout - you added this. Any reason you left the 1024 of wiggle room?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12363614
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    --- End diff --
    
    What is the 1024 for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]SPARK-1712: TaskDescription instance is t...

Posted by witgo <gi...@git.apache.org>.
Github user witgo closed the pull request at:

    https://github.com/apache/spark/pull/677


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/677#issuecomment-42390402
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12364638
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    --- End diff --
    
    `serializedTask` => `4356 bytes`
    `LaunchTask` => `4797 bytes`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]SPARK-1712: TaskDescription instance is t...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12366119
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    --- End diff --
    
    ah I see - it's to leave room for the other contents of the message. In the second case here, do you know what is taking all the extra room? 2704141 - 2690585 = ~13KB which is very large for a few string/int fields!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12363925
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    --- End diff --
    
    The reference  [Executor.scala#L235](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L235).may not fit here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]SPARK-1712: TaskDescription instance is t...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12366698
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    --- End diff --
    
    Yes, the problem is the value of uncertainty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/677#discussion_r12364078
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
               // we assume the task can be serialized without exceptions.
               val serializedTask = Task.serializeWithDependencies(
                 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
    +          if (serializedTask.limit >= akkaFrameSize - 1024) {
    +            val msg = "Serialized task %s:%d were %d bytes which " +
    --- End diff --
    
    The reason for this is to keep the same style of the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]SPARK-1712: TaskDescription instance is t...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/677#issuecomment-42442064
  
    @pwendell
    How about [this solution](https://github.com/witgo/spark/compare/SPARK-1712_new)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/677#issuecomment-42396862
  
    Thanks for tracking down this fix. It looks good - added some small comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---