You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Luca Canali (Jira)" <ji...@apache.org> on 2022/01/12 19:00:00 UTC

[jira] [Updated] (SPARK-34265) Instrument Python UDF execution using SQL Metrics

     [ https://issues.apache.org/jira/browse/SPARK-34265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luca Canali updated SPARK-34265:
--------------------------------
    Attachment: PandasUDF_ArrowEvalPython_Metrics.png

> Instrument Python UDF execution using SQL Metrics
> -------------------------------------------------
>
>                 Key: SPARK-34265
>                 URL: https://issues.apache.org/jira/browse/SPARK-34265
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.1.1
>            Reporter: Luca Canali
>            Priority: Minor
>         Attachments: PandasUDF_ArrowEvalPython_Metrics.png, PythonSQLMetrics_Jira_Picture.png, Python_UDF_instrumentation_lite_ArrowEvalPython.png, Python_UDF_instrumentation_lite_BatchEvalPython.png, proposed_Python_SQLmetrics_v20210128.png
>
>
> This proposes to add SQLMetrics instrumentation for Python UDF. This is aimed at improving monitoring and performance troubleshooting of Python code called by Spark, via UDF, Pandas UDF or with MapPartittions.
> The introduced metrics are exposed to the end users via the WebUI interface, in the SQL tab for execution steps related to Python UDF execution, namely BatchEvalPython, ArrowEvalPython, AggregateInPandas, FlaMapGroupsInPandas, FlatMapsCoGroupsInPandas, WindowsInPandas.
> See also the attached screenshot.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org