You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2014/08/19 04:33:37 UTC

[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/2028

    [SPARK-3116] Remove the excessive lockings in TorrentBroadcast

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark torrentBroadcast

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2028.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2028
    
----
commit 03a522163a95c23e4c0234f37db41235479c0303
Author: Reynold Xin <rx...@apache.org>
Date:   2014-08-19T02:26:46Z

    [SPARK-3116] Remove the excessive lockings in TorrentBroadcast

commit 92c62a5361e4d26776b8969e162e32b99e54f7b8
Author: Reynold Xin <rx...@apache.org>
Date:   2014-08-19T02:32:59Z

    Revert the MEMORY_AND_DISK_SER changes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2028


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52585472
  
    Yes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52583668
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18808/consoleFull) for   PR 2028 at commit [`92c62a5`](https://github.com/apache/spark/commit/92c62a5361e4d26776b8969e162e32b99e54f7b8).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52589813
  
    Regarding the first synchronized in that constructor, my understanding is that the original `TorrentBroadcast.synchronized` was guarding the writing of the broadcast value to the BlockManager.  I think that this synchronization is unnecessary _not_ because we don't leak `this`, but because BlockManager is thread-safe and the uniqueness of broadcast ids means that we'll never have parallel attempts to write the same block.
    
    My point was that _multiple TorrentBroadcast objects_ can be simultaneously under construction, but that it's safe because they'll have distinct ids; I think preventing leakage of a partially-constructed object is a separate issue that doesn't come into play here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52586129
  
    LGTM.Alright I looked through the code and tried to verify this change was safe. Just summarizing what I saw while walking through the code (in case it helps in the future)
    
    1. The first synchronized is in the constructor which two threads cannot call at the same time
    2. `sendBroadcast` is only called from constructor, so that is safe to remove
    3. `receiveBroadcast` is only called from a synchronized block in readObject, so that should be safe to remove (however this should be free in java, to re-acquire an already held lock)
    4. `unpersist` doesn't touch the class, so it is thread-safe (assuming BlockManager's readBroadcast is thread-safe -- I didn't check that)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52584440
  
    Can you clarify what is the guarantee that we are trying to maintain in this class ? Is it that if two executor threads try to read the same broadcast variable we want it to be safe ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52586737
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18808/consoleFull) for   PR 2028 at commit [`92c62a5`](https://github.com/apache/spark/commit/92c62a5361e4d26776b8969e162e32b99e54f7b8).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(input: String = "data/mllib/sample_linear_regression_data.txt")`
      * `  case class Params(input: String = "data/mllib/sample_linear_regression_data.txt")`
      * `  case class Params(input: String = "data/mllib/sample_binary_classification_data.txt")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52583428
  
    cc @mosharaf @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52588649
  
    Hmm -- according to JLS [1], objects are only made available to other threads after the constructor finishes. So technically only one thread can call an object's constructor. However constructors are not thread safe if we expose a reference to `this` to other classes / threads [2].
    
    In this case I didn't see the reference leaking, but I guess it doesn't hurt to track down the callers to verify things.
    
    [1] http://docs.oracle.com/javase/specs/jls/se5.0/html/classes.html#8.8.3
    [2] http://stackoverflow.com/questions/4880168/why-cant-java-constructors-be-synchronized


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52587376
  
    Ok I'm merging this in master & branch-1.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3116] Remove the excessive lockings in ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2028#issuecomment-52588018
  
    > The first synchronized is in the constructor which two threads cannot call at the same time
    
    Actually, I think two threads _could_ get into the constructor simultaneously, but that it wouldn't be a problem because the calls would be for different broadcast ids.  ( `new TorrentBroadcast` is only called from `TorrentBroadcastFactory.newBroadcast`, which is only called from `BroadcastManager.newBroadcast`.  This, in turn, is only called from `SparkContext.broadcast`, which isn't synchronized.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org