You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tashoyan <gi...@git.apache.org> on 2017/11/08 22:12:09 UTC

[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

GitHub user tashoyan opened a pull request:

    https://github.com/apache/spark/pull/19700

    [SPARK-22471][SQL] SQLListener consumes much memory causing OutOfMemoryError

    ## What changes were proposed in this pull request?
    
    This PR addresses the issue [SPARK-22471](https://issues.apache.org/jira/browse/SPARK-22471). The modified version of `SQLListener` respects the setting `spark.ui.retainedStages` and keeps the number of the tracked stages within the specified limit. The hash map `_stageIdToStageMetrics` does not outgrow the limit, hence overall memory consumption does not grow with time anymore.
    
    ## How was this patch tested?
    
    A new unit test covers this fix - see `SQLListenerMemorySuite.scala`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tashoyan/spark SPARK-22471

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19700
    
----
commit 0388f6ce50d568a0493e7959ec005ee5afc20bd0
Author: Arseniy Tashoyan <ta...@gmail.com>
Date:   2017-11-08T15:41:36Z

    Add reproducer for the issue SPARK-22471

commit 42e80272cf0926f0fd978e6b7617685987d8fc93
Author: Arseniy Tashoyan <ta...@gmail.com>
Date:   2017-11-08T15:41:54Z

    Add fix for the issue SPARK-22471

commit 2f793ad1f001bc58dd09fa4eaec6ae423445f86f
Author: Arseniy Tashoyan <ta...@gmail.com>
Date:   2017-11-08T20:39:02Z

    Remove debug print and irrelevant checks. Add a reference to the issue.

commit 4780d95b7d58df741eb8d5756c8109fc7dbfb457
Author: Arseniy Tashoyan <ta...@gmail.com>
Date:   2017-11-08T20:47:44Z

    Remove debug print and irrelevant checks. Add a reference to the issue.

commit 79c83a715d4a36ad00ff3888e8e2953fcc163d17
Author: Arseniy Tashoyan <ta...@gmail.com>
Date:   2017-11-08T21:21:42Z

    Collect memory-related tests on SQLListener in the same suite

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    I'm a little on the fence about considering this for master given #19681 is probably going to merged some time soon, but it should be ok for 2.2.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    +1 for @vanzin 's advice.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by tashoyan <gi...@git.apache.org>.
Github user tashoyan commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    @vanzin would you like to review?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    **[Test build #83653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83653/testReport)** for PR 19700 at commit [`79c83a7`](https://github.com/apache/spark/commit/79c83a715d4a36ad00ff3888e8e2953fcc163d17).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by tashoyan <gi...@git.apache.org>.
Github user tashoyan commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    Well, it would be good to have this quick fix in a 2.2-compatible bugfix release, without waiting for 2.3.0.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19700#discussion_r150041775
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
    @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging {
     
       private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000)
     
    +  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
    --- End diff --
    
    BTW, the name should be `spark.sql.ui.retainedStages` instead of `spark.ui.retainedStages`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    **[Test build #83653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83653/testReport)** for PR 19700 at commit [`79c83a7`](https://github.com/apache/spark/commit/79c83a715d4a36ad00ff3888e8e2953fcc163d17).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83653/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19700#discussion_r150063165
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
    @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging {
     
       private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000)
     
    +  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
    --- End diff --
    
    @tashoyan . Since you are not introducing a new one, could you use the existing default value?
    ```scala
    -  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
    +  private val retainedStages =
    +    conf.getInt("spark.ui.retainedStages", SparkUI.DEFAULT_RETAINED_STAGES)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19700#discussion_r150054899
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
    @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging {
     
       private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000)
     
    +  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
    --- End diff --
    
    Ah. My bad. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    You can open the PR directly against 2.2.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by tashoyan <gi...@git.apache.org>.
Github user tashoyan commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    Done for branch-2.2: #19711 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19700#discussion_r150041314
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
    @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging {
     
       private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000)
     
    +  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
    --- End diff --
    
    @tashoyan . Could you add a doc for this like `spark.sql.ui.retainedExecutions` here?
    Please refer #9052.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

Posted by tashoyan <gi...@git.apache.org>.
Github user tashoyan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19700#discussion_r150052324
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
    @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging {
     
       private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000)
     
    +  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
    --- End diff --
    
    @dongjoon-hyun , It is already documented in the same file configuration.md:
    ```
    How many stages the Spark UI and status APIs remember before garbage collecting.
    This is a target maximum, and fewer elements may be retained in some circumstances.
    ```
    I did not involve a new parameter, I just used an existing one.
    Regarding renaming to `spark.sql.ui.retainedStages`, I believe it should be done in a separate pull request - if should. This parameter is also used in other parts of Spark code, not only SQL.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

Posted by tashoyan <gi...@git.apache.org>.
Github user tashoyan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19700#discussion_r150100750
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
    @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging {
     
       private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000)
     
    +  private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
    --- End diff --
    
    Done for branch-2.2


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19700
  
    @tashoyan you can close out this one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...

Posted by tashoyan <gi...@git.apache.org>.
Github user tashoyan closed the pull request at:

    https://github.com/apache/spark/pull/19700


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org