You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/11/09 18:33:00 UTC

[jira] [Assigned] (SPARK-40281) Memory Profiler on Executors

     [ https://issues.apache.org/jira/browse/SPARK-40281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-40281:
------------------------------------

    Assignee: Apache Spark

> Memory Profiler on Executors
> ----------------------------
>
>                 Key: SPARK-40281
>                 URL: https://issues.apache.org/jira/browse/SPARK-40281
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 3.4.0
>            Reporter: Xinrong Meng
>            Assignee: Apache Spark
>            Priority: Major
>
> Profiling is critical to performance engineering. Memory consumption is a key indicator of how efficient a PySpark program is. There is an existing effort on memory profiling of Python programs, Memory Profiler ([https://pypi.org/project/memory-profiler/).|https://pypi.org/project/memory-profiler/]
> PySpark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in the driver program. On the driver side, PySpark is a regular Python process, thus, we can profile it as a normal Python program using Memory Profiler.
> However, on the executors side, we are missing such memory profiler. Since executors are distributed on different nodes in the cluster, we need to aggregate profiles. Furthermore, Python worker processes are spawned per executor for the Python/Pandas UDF execution, which makes the memory profiling more intricate.
> The ticket proposes to implement a Memory Profiler on Executors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org