You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/07/10 22:27:00 UTC
[jira] [Commented] (SPARK-37730) plot.hist throws AttributeError on pandas=1.3.5
[ https://issues.apache.org/jira/browse/SPARK-37730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564727#comment-17564727 ]
Dongjoon Hyun commented on SPARK-37730:
---------------------------------------
This is backported to branch-3.2 for Apache Spark 3.2.2 via https://github.com/apache/spark/commit/bc54a3f0c2e08893702c3929bfe7a9d543a08cdb
> plot.hist throws AttributeError on pandas=1.3.5
> -----------------------------------------------
>
> Key: SPARK-37730
> URL: https://issues.apache.org/jira/browse/SPARK-37730
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.2.0, 3.3.0
> Environment: Conda environment.yml (also tested with 3.3.0-SNAPSHOT):
> {{name: testenv}}
> {{channels:}}
> {{ - conda-forge}}
> {{dependencies:}}
> {{ - python=3.9.9}}
> {{ }}
> {{ - numpy=1.21.5}}
> {{ - pandas=1.3.5}}
> {{ - matplotlib=3.5.1}}
> {{ }}
> {{ - pyspark=3.2.0}}
>
> Reporter: Michał Słapek
> Assignee: Michał Słapek
> Priority: Major
> Fix For: 3.3.0
>
>
> plot.hist from PySpark throws AttributeError exception when pyspark.pandas is used with pandas=1.3.5.
> Pandas in commit [https://github.com/pandas-dev/pandas/commit/029907c9d69a0260401b78a016a6c4515d8f1c40]
> replaced MPLPlot._add_legend_handle with MPLPlot._append_legend_handles_labels.
> I've attached PR on github which replaces use of MPLPlot._add_legend_handle in PySpark with MPLPlot._append_legend_handles_labels.
>
> Code:
>
> {code:java}
> import pyspark.pandas as ps
> from matplotlib import pyplot as plt
> ps.set_option("plotting.backend", "matplotlib")
> df = ps.DataFrame({'data': [4, 5, 5, 6, 8, 9]})
> df['data'].plot.hist()
> plt.show()
> {code}
>
>
> Truncated traceback:
> {code:java}
> Traceback (most recent call last):
> File "/home/develop/Documents/sparkbug/code.py", line 6, in <module>
> df['data'].plot.hist()
> ...
> File "/mnt/transient/develop/miniconda3/envs/testenv/lib/python3.9/site-packages/pyspark/pandas/plot/matplotlib.py", line 403, in _make_plot
> self._add_legend_handle(artists[0], label, index=i)
> AttributeError: 'PandasOnSparkHistPlot' object has no attribute '_add_legend_handle' {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org