You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2015/02/28 21:24:59 UTC

[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/4835

    [SPARK-6075] Fix bug in that caused lost accumulator updates: do not store WeakReferences in localAccums map

    This fixes a non-deterministic bug introduced in #4021 that could cause tasks' accumulator updates to be lost.  The problem is that `localAccums` should not hold weak references: after the task finishes running there won't be any strong references to these local accumulators, so they can get garbage-collected before the executor reads the `localAccums` map.  We don't need weak references here anyways, since this map is cleared at the end of each task.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-6075

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4835
    
----
commit 120c7b085008e708e5586169eb33fb01b3a733d1
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-02-28T20:19:34Z

    [SPARK-6075] Do not store WeakReferences in localAccums map

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76548024
  
    I still think that we should investigate / add that assertion, since correctness of accumulators relies on the assumption that a deserialized task has only one instance of an an accumulator with a given id, but it looks like this issue existed prior to this recent WeakReference change; I'll roll back my new assertion and follow up on that issue in a separate patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76546841
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28131/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76544337
  
      [Test build #28131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28131/consoleFull) for   PR 4835 at commit [`120c7b0`](https://github.com/apache/spark/commit/120c7b085008e708e5586169eb33fb01b3a733d1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76544221
  
    /cc @andrewor14, it turns out that the "flaky" accumulator test was actually a real bug (fixed by this patch).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76546836
  
      [Test build #28131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28131/consoleFull) for   PR 4835 at commit [`120c7b0`](https://github.com/apache/spark/commit/120c7b085008e708e5586169eb33fb01b3a733d1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76556097
  
      [Test build #28133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28133/consoleFull) for   PR 4835 at commit [`4f4b5b2`](https://github.com/apache/spark/commit/4f4b5b2afd2788c5e25c59db7c1fb683678aac40).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76548673
  
      [Test build #28133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28133/consoleFull) for   PR 4835 at commit [`4f4b5b2`](https://github.com/apache/spark/commit/4f4b5b2afd2788c5e25c59db7c1fb683678aac40).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76556108
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28133/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76892275
  
    I think this PR also fixes [SPARK-6020] [1], since `InMemoryColumnarTableScan` uses accumulator to generate debugging information which is dedicated for `PartitionBatchPruningSuite`. The test failures showed in SPARK-6020 suggested that accumulator updates may get lost nondeterminstically.
    
    [1]: https://issues.apache.org/jira/browse/SPARK-6020


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76546964
  
    That's a legitimate test failure caused by a new assertion that I added here (being extra defensive about avoiding duplicate accumulator registration, since this might cause silent data loss):
    
    ```
    Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 4, localhost): java.io.IOException: java.lang.IllegalArgumentException: requirement failed: Accumulator 2 has already been registered  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1165)  at org.apache.spark.Accumulable.readObject(Accumulators.scala:133)  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  at java.lang.reflect.Method.invoke(Method.java:606)  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)  at java.io.O
 bjectInputStream.defaultReadFields(ObjectInputStream.java:1990)  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)  at java.io.ObjectInputStream.rea
 dSerialData(ObjectInputStream.java:1915)  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)  at 
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4835


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76544242
  
    This only affects `master` for the last 6 days or so, by the way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6075] Fix bug in that caused lost accum...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4835#issuecomment-76584666
  
    I'm going to commit this to `master` (1.4.0) in order to fix the bug and failing tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org