You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Masiero Vanzin (Jira)" <ji...@apache.org> on 2019/10/22 23:59:00 UTC

[jira] [Created] (SPARK-29562) SQLAppStatusListener metrics aggregation is slow and memory hungry

Marcelo Masiero Vanzin created SPARK-29562:
----------------------------------------------

             Summary: SQLAppStatusListener metrics aggregation is slow and memory hungry
                 Key: SPARK-29562
                 URL: https://issues.apache.org/jira/browse/SPARK-29562
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.4
            Reporter: Marcelo Masiero Vanzin


While {{SQLAppStatusListener}} was added in 2.3, the aggregation code is very similar to what it was previously, so I'm sure this is even older.

Long story short, the aggregation code ({{SQLAppStatusListener.aggregateMetrics}}) is very, very slow, and can take a non-trivial amount of time with large queries, aside from using a ton of memory.

There are also cascading issues caused by that: since it's called from an event handler, it can slow down event processing, causing events to be dropped, which can cause listeners to miss important events that would tell them to free up internal state (and, thus, memory).

To given an anecdotal example, one app I looked at ran into the "events being dropped" issue, which caused the listener to accumulate state for 100s of live stages, even though most of them were already finished. That lead to a few GB of memory being wasted due to finished stages that were still being tracked.

Here, though, I'd like to focus on {{SQLAppStatusListener.aggregateMetrics}} and making it faster. We should look at the other issues (unblocking event processing, cleaning up of stale data in listeners) separately.

(I also remember someone in the past trying to fix something in this area, but couldn't find a PR nor an open bug.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org