You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by 2ooom <gi...@git.apache.org> on 2017/08/06 12:24:34 UTC

[GitHub] spark pull request #18860: [SPARK-21254] [WebUI] History UI performance fixe...

GitHub user 2ooom opened a pull request:

    https://github.com/apache/spark/pull/18860

    [SPARK-21254] [WebUI] History UI performance fixes 

    ## This is a backport of PR #18783 to the latest released branch 2.2.  
    
    ## What changes were proposed in this pull request?
    
    As described in JIRA ticket, History page is taking ~1min to load for cases when amount of jobs is 10k+.
    Most of the time is currently being spent on DOM manipulations and all additional costs implied by this (browser repaints and reflows).
    PR's goal is not to change any behavior but to optimize time of History UI rendering:
    
    1. The most costly operation is setting `innerHTML` for `duration` column within a loop, which is [extremely unperformant](https://jsperf.com/jquery-append-vs-html-list-performance/24). [Refactoring ](https://github.com/criteo-forks/spark/commit/b7e56eef4d66af977bd05af58a81e14faf33c211) this helped to get page load time **down to 10-15s**
    
    2. Second big gain bringing page load time **down to 4s** was [was achieved](https://github.com/criteo-forks/spark/commit/3630ca212baa94d60c5fe7e4109cf6da26288cec) by detaching table's DOM before parsing it with DataTables jQuery plugin.
    
    3. Another chunk of improvements ([1]https://github.com/criteo-forks/spark/commit/aeeeeb520d156a7293a707aa6bc053a2f83b9ac2), [2](https://github.com/criteo-forks/spark/commit/e25be9a66b018ba0cc53884f242469b515cb2bf4), [3](https://github.com/criteo-forks/spark/commit/91697079a29138b7581e64f2aa79247fa1a4e4af)) was focused on removing unnecessary DOM manipulations that in total contributed ~250ms to page load time.
    
    ## How was this patch tested?
    
    Tested by existing Selenium tests in `org.apache.spark.deploy.history.HistoryServerSuite`.
    
    Changes were also tested on Criteo's spark-2.1 fork with 20k+ number of rows in the table, reducing load time to 4s.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/criteo-forks/spark history-ui-perf-fix-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18860
    
----
commit b7e56eef4d66af977bd05af58a81e14faf33c211
Author: Dmitry Parfenchik <d....@criteo.com>
Date:   2017-06-05T20:42:41Z

    [SPARK-21254] [WebUI] History UI: Improving performance by detaching table DOM before processing
    
    Currently all the DOM manipulations are handled in a loop after Mustache
    template is parsed. This causes severe performance issues especially within
    loops iteration over thousands of (attempt/application) records and causing
    all kinds of unnecessary browser work: reflow, repaint, etc.
    
    This could be easily fixed by preparing a DOM node beforehand and doing all
    manipulations within the loops on detached node, reattaching it to the document
    only after the work is done.
    
    The most costly operation in this case was setting innerHTML for `duration`
    column within a loop, which is extremely unperformant:
    
    https://jsperf.com/jquery-append-vs-html-list-performance/24
    
    While duration parsing could be done before mustache-template processing without
    any additional DOM alteratoins.

commit aeeeeb520d156a7293a707aa6bc053a2f83b9ac2
Author: Dmitry Parfenchik <d....@criteo.com>
Date:   2017-07-30T14:52:24Z

    [SPARK-21254] [WebUI] Performance optimization for pagination check
    
    Check whether to display pagination or not on large data sets (10-20k rows)
    was taking up to 50ms because it was iterating over all rows. This could be
    easily done by testing length of array before passing it to mustache.

commit e25be9a66b018ba0cc53884f242469b515cb2bf4
Author: Dmitry Parfenchik <d....@criteo.com>
Date:   2017-07-30T15:23:37Z

    [SPARK-21254] [WebUI] Performance improvement: removing unnecessary DOM processing
    
    Logic related to `hasMultipleAttempts` flag:
    
     - Hiding attmptId column (if `hasMultipleAttempts = false`)
     - Seting white background color for first 2 columns (if `hasMultipleAttempts = true`)
    
    was updating DOM after mustache template processing, which was causing 2 unnecessary
    iterations over full data set (first through jquery selector, than through for-loop).
    
    Refactoring it inside mustache template helps saving 80-90ms on large data sets (10k+ rows)

commit 91697079a29138b7581e64f2aa79247fa1a4e4af
Author: Dmitry Parfenchik <d....@criteo.com>
Date:   2017-07-30T20:06:32Z

    [SPARK-21254] [WebUI] Performance improvement: further reducing DOM manipulations
    
    Refactoring incomplete requests filter behavior due to inefficency in DOM
    manipulations. We were traversing DOM 2 more times just to hide columns
    that we could have avoided rendering in mustache. Factoring this logic in
    mustache template (`showCompletedColumn`) saves 70-80ms on 10k+ rows.

commit 3630ca212baa94d60c5fe7e4109cf6da26288cec
Author: Dmitry Parfenchik <d....@criteo.com>
Date:   2017-07-30T20:26:59Z

    [SPARK-21254] [WebUI] Performance improvements: detaching DOM before DataTables plugin processing
    
    Detaching history table wrapper from document before parsing it with DataTables plugin
    and reattaching back right after plugin has processed nested DOM. This allows to avoid
    huge amount of browser repaints and reflows, reducing initial page load time in Chrome
    from 15s to 4s for 20k+ rows

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    Do we really need to backport this? @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by ajbozarth <gi...@git.apache.org>.
Github user ajbozarth commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    I like the idea of back porting this, but I agree with @srowen that this falls into a grey area between an improvement and a bug fix. If there's opposition to the back-port I won't argue, but I'm personally in favor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    @2ooom merged to 2.2. You can close this PR now; it won't auto-close.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    @2ooom how useful would that be for y'all ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    Yes, tough call on the back-port. I'd normally say we don't back-port optimizations to maintenance branches. If the optimization relieved a significant usability problem, you could consider it a bug fix. For this one I'm on the fence about it, and would prefer to hear someone else in support of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by 2ooom <gi...@git.apache.org>.
Github user 2ooom commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    @srowen At criteo the issue has grown critical about 3 month ago and at that point (we had 11k rows in history table at a time and) when history page was taking at least 1min to load.  Now we're at 26k+ rows and we're increasing spark adoption.
    So I would actually treat this PR as a bug fix and try to merge in all supported branches (2.2, 2.3, maybe even 2.1) to save frustration for other spark-users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    **[Test build #3911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3911/testReport)** for PR 18860 at commit [`3630ca2`](https://github.com/apache/spark/commit/3630ca212baa94d60c5fe7e4109cf6da26288cec).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    **[Test build #3880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3880/testReport)** for PR 18860 at commit [`3630ca2`](https://github.com/apache/spark/commit/3630ca212baa94d60c5fe7e4109cf6da26288cec).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by 2ooom <gi...@git.apache.org>.
Github user 2ooom commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    Thank you @srowen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    **[Test build #3911 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3911/testReport)** for PR 18860 at commit [`3630ca2`](https://github.com/apache/spark/commit/3630ca212baa94d60c5fe7e4109cf6da26288cec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18860: [SPARK-21254] [WebUI] History UI performance fixes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18860
  
    **[Test build #3880 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3880/testReport)** for PR 18860 at commit [`3630ca2`](https://github.com/apache/spark/commit/3630ca212baa94d60c5fe7e4109cf6da26288cec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18860: [SPARK-21254] [WebUI] History UI performance fixe...

Posted by 2ooom <gi...@git.apache.org>.
Github user 2ooom closed the pull request at:

    https://github.com/apache/spark/pull/18860


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org