You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (Jira)" <ji...@apache.org> on 2020/05/22 07:22:00 UTC

[jira] [Updated] (SPARK-31793) Reduce the memory usage in file scan location metadata

     [ https://issues.apache.org/jira/browse/SPARK-31793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gengliang Wang updated SPARK-31793:
-----------------------------------
    Summary: Reduce the memory usage in file scan location metadata  (was: Reduce the memory usage in data source scan metadata)

> Reduce the memory usage in file scan location metadata
> ------------------------------------------------------
>
>                 Key: SPARK-31793
>                 URL: https://issues.apache.org/jira/browse/SPARK-31793
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>
> Currently, the data source scan node stores all the paths in its metadata. The metadata is kept when a SparkPlan is converted into SparkPlanInfo. SparkPlanInfo can be used to construct the Spark plan graph in UI.
> However, the paths can be very large (e.g. it can be many partitions after partition pruning), while UI pages only require up to 100 bytes for the location metadata. We can reduce the paths stored in metadata to reduce memory usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org