You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/06 20:26:00 UTC

[jira] [Updated] (SPARK-45439) Reduce memory usage of LiveStageMetrics.accumIdsToMetricType

     [ https://issues.apache.org/jira/browse/SPARK-45439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated SPARK-45439:
-----------------------------------
    Labels: pull-request-available  (was: )

> Reduce memory usage of LiveStageMetrics.accumIdsToMetricType
> ------------------------------------------------------------
>
>                 Key: SPARK-45439
>                 URL: https://issues.apache.org/jira/browse/SPARK-45439
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Major
>              Labels: pull-request-available
>
> This PR aims to reduce the memory consumption of {{{}LiveStageMetrics.accumIdsToMetricType{}}}, which should help to reduce driver memory usage when running complex SQL queries that contain many operators and run many jobs.
> In SQLAppStatusListener, the LiveStageMetrics.accumIdsToMetricType field holds a map which is used to look up the type of accumulators in order to perform conditional processing of a stage’s metrics.
> Currently, that field is derived from {{{}LiveExecutionData.metrics{}}}, which contains metrics for _all_ operators used anywhere in the query. Whenever a job is submitted, we construct a fresh map containing all metrics that have ever been registered for that SQL query. If a query runs a single job, this isn't an issue: in that case, all {{LiveStageMetrics}} instances will hold the same immutable {{{}accumIdsToMetricType{}}}.
> The problem arises if we have a query that runs many jobs (e.g. a complex query with many joins which gets divided into many jobs due to AQE): in that case, each job submission results in a new {{accumIdsToMetricType}} map being created.
> This PR fixes this by changing {{accumIdsToMetricType}} to be a mutable {{ConcurrentHashMap}} which is shared across all {{LivestageMetrics}} instances belonging to the same {{{}LiveExecutionData{}}}.
> The modified classes are {{private}} and are used only in SQLAppStatusListener, so I don't think this change poses any realistic risk of binary incompatibility risks to third party code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org