You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Luca Canali (Jira)" <ji...@apache.org> on 2019/12/19 15:10:00 UTC

[jira] [Updated] (SPARK-30306) Instrument Python UDF execution time and metrics using Spark Metrics system

     [ https://issues.apache.org/jira/browse/SPARK-30306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luca Canali updated SPARK-30306:
--------------------------------
    Attachment: PandasUDF_Time_Instrumentation_Annotated.png

> Instrument Python UDF execution time and metrics using Spark Metrics system
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-30306
>                 URL: https://issues.apache.org/jira/browse/SPARK-30306
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Luca Canali
>            Priority: Minor
>         Attachments: PandasUDF_Time_Instrumentation_Annotated.png
>
>
> This proposes to extend Spark instrumentation to add metrics aimed at understanding the performance of Python code called by Spark, via UDF, Pandas UDF or with MapPartittions. Relevant performance counters are exposed using the Spark Metrics System (based on the Dropwizard library).  This allows to easily consume the metrics produced by executors, for example using a performance dashboard. See also the attached screenshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org