You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (Jira)" <ji...@apache.org> on 2023/10/06 20:22:00 UTC

[jira] [Created] (SPARK-45439) Reduce memory usage of LiveStageMetrics.accumIdsToMetricType

Josh Rosen created SPARK-45439:
----------------------------------

             Summary: Reduce memory usage of LiveStageMetrics.accumIdsToMetricType
                 Key: SPARK-45439
                 URL: https://issues.apache.org/jira/browse/SPARK-45439
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Josh Rosen


This PR aims to reduce the memory consumption of {{{}LiveStageMetrics.accumIdsToMetricType{}}}, which should help to reduce driver memory usage when running complex SQL queries that contain many operators and run many jobs.

In SQLAppStatusListener, the LiveStageMetrics.accumIdsToMetricType field holds a map which is used to look up the type of accumulators in order to perform conditional processing of a stage’s metrics.

Currently, that field is derived from {{{}LiveExecutionData.metrics{}}}, which contains metrics for _all_ operators used anywhere in the query. Whenever a job is submitted, we construct a fresh map containing all metrics that have ever been registered for that SQL query. If a query runs a single job, this isn't an issue: in that case, all {{LiveStageMetrics}} instances will hold the same immutable {{{}accumIdsToMetricType{}}}.

The problem arises if we have a query that runs many jobs (e.g. a complex query with many joins which gets divided into many jobs due to AQE): in that case, each job submission results in a new {{accumIdsToMetricType}} map being created.

This PR fixes this by changing {{accumIdsToMetricType}} to be a mutable {{ConcurrentHashMap}} which is shared across all {{LivestageMetrics}} instances belonging to the same {{{}LiveExecutionData{}}}.

The modified classes are {{private}} and are used only in SQLAppStatusListener, so I don't think this change poses any realistic risk of binary incompatibility risks to third party code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org