You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by techaddict <gi...@git.apache.org> on 2014/05/09 05:18:21 UTC

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

GitHub user techaddict opened a pull request:

    https://github.com/apache/spark/pull/707

    SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo

    This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/techaddict/spark SPARK-1775

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #707
    
----
commit 18d8ebf3d04cd71d13ca95b432024f4d1f8c0718
Author: Sandeep <sa...@techaddict.me>
Date:   2014-05-09T03:14:05Z

    SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo
    This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42630504
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42630415
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by techaddict <gi...@git.apache.org>.
Github user techaddict commented on a diff in the pull request:

    https://github.com/apache/spark/pull/707#discussion_r12464716
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
    @@ -58,15 +58,13 @@ private[spark] object ShuffleMapTask {
       }
     
       def deserializeInfo(stageId: Int, bytes: Array[Byte]): (RDD[_], ShuffleDependency[_,_]) = {
    -    synchronized {
    -      val loader = Thread.currentThread.getContextClassLoader
    -      val in = new GZIPInputStream(new ByteArrayInputStream(bytes))
    -      val ser = SparkEnv.get.closureSerializer.newInstance()
    -      val objIn = ser.deserializeStream(in)
    -      val rdd = objIn.readObject().asInstanceOf[RDD[_]]
    -      val dep = objIn.readObject().asInstanceOf[ShuffleDependency[_,_]]
    -      (rdd, dep)
    -    }
    +    val loader = Thread.currentThread.getContextClassLoader
    --- End diff --
    
    sure.
    
    
    On Fri, May 9, 2014 at 10:58 AM, Patrick Wendell
    <no...@github.com>wrote:
    
    > In core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala:
    >
    > > @@ -58,15 +58,13 @@ private[spark] object ShuffleMapTask {
    > >    }
    > >
    > >    def deserializeInfo(stageId: Int, bytes: Array[Byte]): (RDD[_], ShuffleDependency[_,_]) = {
    > > -    synchronized {
    > > -      val loader = Thread.currentThread.getContextClassLoader
    > > -      val in = new GZIPInputStream(new ByteArrayInputStream(bytes))
    > > -      val ser = SparkEnv.get.closureSerializer.newInstance()
    > > -      val objIn = ser.deserializeStream(in)
    > > -      val rdd = objIn.readObject().asInstanceOf[RDD[_]]
    > > -      val dep = objIn.readObject().asInstanceOf[ShuffleDependency[_,_]]
    > > -      (rdd, dep)
    > > -    }
    > > +    val loader = Thread.currentThread.getContextClassLoader
    >
    > I can just remove this code on merge - it's dead code
    >
    > —
    > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/707/files#r12464691>
    > .
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42631774
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14836/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42631585
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/707#discussion_r12464288
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
    @@ -58,15 +58,13 @@ private[spark] object ShuffleMapTask {
       }
     
       def deserializeInfo(stageId: Int, bytes: Array[Byte]): (RDD[_], ShuffleDependency[_,_]) = {
    -    synchronized {
    -      val loader = Thread.currentThread.getContextClassLoader
    -      val in = new GZIPInputStream(new ByteArrayInputStream(bytes))
    -      val ser = SparkEnv.get.closureSerializer.newInstance()
    -      val objIn = ser.deserializeStream(in)
    -      val rdd = objIn.readObject().asInstanceOf[RDD[_]]
    -      val dep = objIn.readObject().asInstanceOf[ShuffleDependency[_,_]]
    -      (rdd, dep)
    -    }
    +    val loader = Thread.currentThread.getContextClassLoader
    --- End diff --
    
    Yeah I'd do that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42630494
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by techaddict <gi...@git.apache.org>.
Github user techaddict commented on a diff in the pull request:

    https://github.com/apache/spark/pull/707#discussion_r12463894
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
    @@ -58,15 +58,13 @@ private[spark] object ShuffleMapTask {
       }
     
       def deserializeInfo(stageId: Int, bytes: Array[Byte]): (RDD[_], ShuffleDependency[_,_]) = {
    -    synchronized {
    -      val loader = Thread.currentThread.getContextClassLoader
    -      val in = new GZIPInputStream(new ByteArrayInputStream(bytes))
    -      val ser = SparkEnv.get.closureSerializer.newInstance()
    -      val objIn = ser.deserializeStream(in)
    -      val rdd = objIn.readObject().asInstanceOf[RDD[_]]
    -      val dep = objIn.readObject().asInstanceOf[ShuffleDependency[_,_]]
    -      (rdd, dep)
    -    }
    +    val loader = Thread.currentThread.getContextClassLoader
    --- End diff --
    
    Should i remove it then ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/707#discussion_r12463199
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
    @@ -58,15 +58,13 @@ private[spark] object ShuffleMapTask {
       }
     
       def deserializeInfo(stageId: Int, bytes: Array[Byte]): (RDD[_], ShuffleDependency[_,_]) = {
    -    synchronized {
    -      val loader = Thread.currentThread.getContextClassLoader
    -      val in = new GZIPInputStream(new ByteArrayInputStream(bytes))
    -      val ser = SparkEnv.get.closureSerializer.newInstance()
    -      val objIn = ser.deserializeStream(in)
    -      val rdd = objIn.readObject().asInstanceOf[RDD[_]]
    -      val dep = objIn.readObject().asInstanceOf[ShuffleDependency[_,_]]
    -      (rdd, dep)
    -    }
    +    val loader = Thread.currentThread.getContextClassLoader
    --- End diff --
    
    I think this value is not used anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42630110
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42631586
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14835/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42630554
  
    This change seems reasonable. Going further, I wonder if we could also just use a ConcurrentHashMap instead of synchronizing on the serializedInfoCache.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42630101
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/707#discussion_r12464691
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
    @@ -58,15 +58,13 @@ private[spark] object ShuffleMapTask {
       }
     
       def deserializeInfo(stageId: Int, bytes: Array[Byte]): (RDD[_], ShuffleDependency[_,_]) = {
    -    synchronized {
    -      val loader = Thread.currentThread.getContextClassLoader
    -      val in = new GZIPInputStream(new ByteArrayInputStream(bytes))
    -      val ser = SparkEnv.get.closureSerializer.newInstance()
    -      val objIn = ser.deserializeStream(in)
    -      val rdd = objIn.readObject().asInstanceOf[RDD[_]]
    -      val dep = objIn.readObject().asInstanceOf[ShuffleDependency[_,_]]
    -      (rdd, dep)
    -    }
    +    val loader = Thread.currentThread.getContextClassLoader
    --- End diff --
    
    I can just remove this code on merge - it's dead code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/707#issuecomment-42631772
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1775: Unneeded lock in ShuffleMapTask.de...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/707


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---