You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by patrickbrownsync <gi...@git.apache.org> on 2018/10/29 20:23:23 UTC

[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...

GitHub user patrickbrownsync opened a pull request:

    https://github.com/apache/spark/pull/22883

    [SPARK-25837] [Core] Fix potential slowdown in AppStatusListener when cleaning up stages

    ## What changes were proposed in this pull request?
    
    * Update `AppStatusListener` `cleanupStages` method to remove tasks for those stages in a single pass instead of 1 for each stage.
    * This fixes an issue where the cleanupStages method would get backed up, causing a backup in the executor in ElementTrackingStore, resulting in stages and jobs not getting cleaned up properly.
    
    Tasks seem most susceptible to this as there are a lot of them, however a similar issue could arise in other locations the `KVStore` `view` method is used. A broader fix might involve updates to `KVStoreView` and `InMemoryView` as it appears this interface and implementation can lead to multiple and inefficient traversals of the stored data.
    
    ## How was this patch tested?
    
    Using existing tests in AppStatusListenerSuite


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Blyncs/spark cleanup-stages-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22883.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22883
    
----
commit 4bfdac025ff0906316cf7697933a7b374ae3b427
Author: Patrick Brown <pa...@...>
Date:   2018-10-29T19:49:50Z

    Update cleanupStages in AppStatusListener to delete tasks for all stages in a single pass

commit 178f7c3bf82f93177fce086037ece6ebf09bb350
Author: Patrick Brown <pa...@...>
Date:   2018-10-29T19:55:38Z

    remove uneeded type

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...

Posted by patrickbrownsync <gi...@git.apache.org>.
Github user patrickbrownsync commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22883#discussion_r229539016
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -1105,6 +1095,15 @@ private[spark] class AppStatusListener(
     
           cleanupCachedQuantiles(key)
         }
    +
    +    // Delete tasks for all stages in one pass, as deleting them for each stage individually is slow
    --- End diff --
    
    Sure
    
    Take a look at the implementation of InMemoryView at spark/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java line 179
    
    specifically the implementation of iterator on line 193, here is an excerpt:
    
    ```
    Collections.sort(sorted, (e1, e2) -> modifier * compare(e1, e2, getter));
    Stream<T> stream = sorted.stream();
    
    if (first != null) {
        stream = stream.filter(e -> modifier * compare(e, getter, first) >= 0);
    }
    
    if (last != null) {
        stream = stream.filter(e -> modifier * compare(e, getter, last) <= 0);
    }
    ```
    
    and the original, in loop deletion code:
    
    ```
    val tasks = kvstore.view(classOf[TaskDataWrapper])	
            .index("stage")	
            .first(key)	
            .last(key)	
            .asScala	
          
    tasks.foreach { t =>	
        kvstore.delete(t.getClass(), t.taskId)	
    }
    ```
    
    So you can see, if we do this each loop we actually sort the whole collection of TaskDataWrapper which are currently in the store, then go through and check each item based on the key set (the stage). Assuming we have a large number of stages and tasks this is an O(n^2) operation, which is what happens on my production application and the repro code.
    
    If we do this in one pass for all stages, we only sort and iterate the list of tasks one time.
    
    This same pattern happens fairly frequently using the KVStoreView interface and InMemoryView implementation. Since I am new to contributing to Spark I did not undertake a massive refactor, but I would suggest that this interface and implementation should be looked at and re-designed with efficiency in mind. The current implementation favors flexibility in terms of how the dataset is sorted and filtered, but enforcing a single sort order via something like a SortedSet would hopefully make it clear when the operation being performed was efficiently searching inside the collection, and when you were using an inefficient access pattern.
    
    I hope that explains the reasoning, if you have any more questions let me know.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22883#discussion_r229535706
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -1105,6 +1095,15 @@ private[spark] class AppStatusListener(
     
           cleanupCachedQuantiles(key)
         }
    +
    +    // Delete tasks for all stages in one pass, as deleting them for each stage individually is slow
    --- End diff --
    
    Hi @patrickbrownsync , can you explain more about why deleting them in the loop above is slower?  I don't quite understand it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    **[Test build #98331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98331/testReport)** for PR 22883 at commit [`82fef86`](https://github.com/apache/spark/commit/82fef8686f4b71d94b3bfe44ea809b4d88f5fe70).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22883


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    (Merged to 2.3 also.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    **[Test build #98331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98331/testReport)** for PR 22883 at commit [`82fef86`](https://github.com/apache/spark/commit/82fef8686f4b71d94b3bfe44ea809b4d88f5fe70).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by patrickbrownsync <gi...@git.apache.org>.
Github user patrickbrownsync commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    I would be happy to start thinking about that.
    
    Can you provide more details on what you mean by data loss on write in the InMemoryStore?
    
    I now see that the types being inserted have an annotation around which fields they are allowed to be indexed by, but I don't have a good idea of how you would allow multiple indexes in a collection short of running something like SQLite in memory.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    **[Test build #98329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98329/testReport)** for PR 22883 at commit [`178f7c3`](https://github.com/apache/spark/commit/178f7c3bf82f93177fce086037ece6ebf09bb350).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Merging to master / 2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Seems reasonable. Ping @vanzin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    **[Test build #98329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98329/testReport)** for PR 22883 at commit [`178f7c3`](https://github.com/apache/spark/commit/178f7c3bf82f93177fce086037ece6ebf09bb350).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98329/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22883#discussion_r229805894
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -1105,6 +1095,15 @@ private[spark] class AppStatusListener(
     
           cleanupCachedQuantiles(key)
         }
    +
    +    // Delete tasks for all stages in one pass, as deleting them for each stage individually is slow
    +    val tasks = kvstore.view(classOf[TaskDataWrapper]).asScala
    +    val keys = stages.map(s => (s.info.stageId, s.info.attemptId)).toSet
    --- End diff --
    
    nit: `.map { s => ... }`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    btw I'm well aware that `InMemoryView` is kinda slow on the read side; improvements there would be welcome. The main issue is that the write path should be as fast as possible, since that can actually cause data loss if you have a burst of new events.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98331/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22883
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org