You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/06/05 04:01:00 UTC
[jira] [Resolved] (SPARK-31903) toPandas with Arrow enabled doesn't
show metrics in Query UI.
[ https://issues.apache.org/jira/browse/SPARK-31903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-31903.
----------------------------------
Assignee: Takuya Ueshin
Resolution: Fixed
> toPandas with Arrow enabled doesn't show metrics in Query UI.
> -------------------------------------------------------------
>
> Key: SPARK-31903
> URL: https://issues.apache.org/jira/browse/SPARK-31903
> Project: Spark
> Issue Type: Bug
> Components: PySpark, R
> Affects Versions: 2.4.5, 3.0.0
> Reporter: Takuya Ueshin
> Assignee: Takuya Ueshin
> Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2020-06-03 at 4.47.07 PM.png, Screen Shot 2020-06-03 at 4.47.27 PM.png
>
>
> When calling {{toPandas}}, usually Query UI shows each plan node's metric and corresponding Stage ID and Task ID:
> {code:java}
> >>> df = spark.createDataFrame([(1, 10, 'abc'), (2, 20, 'def')], schema=['x', 'y', 'z'])
> >>> df.toPandas()
> x y z
> 0 1 10 abc
> 1 2 20 def
> {code}
> !Screen Shot 2020-06-03 at 4.47.07 PM.png!
> but if Arrow execution is enabled, it shows only plan nodes and the duration is not correct:
> {code:java}
> >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> >>> df.toPandas()
> x y z
> 0 1 10 abc
> 1 2 20 def{code}
>
> !Screen Shot 2020-06-03 at 4.47.27 PM.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org