You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/07/10 22:27:00 UTC

[jira] [Commented] (SPARK-37730) plot.hist throws AttributeError on pandas=1.3.5

    [ https://issues.apache.org/jira/browse/SPARK-37730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564727#comment-17564727 ] 

Dongjoon Hyun commented on SPARK-37730:
---------------------------------------

This is backported to branch-3.2 for Apache Spark 3.2.2 via https://github.com/apache/spark/commit/bc54a3f0c2e08893702c3929bfe7a9d543a08cdb

> plot.hist throws AttributeError on pandas=1.3.5
> -----------------------------------------------
>
>                 Key: SPARK-37730
>                 URL: https://issues.apache.org/jira/browse/SPARK-37730
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.0, 3.3.0
>         Environment: Conda environment.yml (also tested with 3.3.0-SNAPSHOT):
> {{name: testenv}}
> {{channels:}}
> {{  - conda-forge}}
> {{dependencies:}}
> {{  - python=3.9.9}}
> {{  }}
> {{  - numpy=1.21.5}}
> {{  - pandas=1.3.5}}
> {{  - matplotlib=3.5.1}}
> {{  }}
> {{  - pyspark=3.2.0}}
>  
>            Reporter: Michał Słapek
>            Assignee: Michał Słapek
>            Priority: Major
>             Fix For: 3.3.0
>
>
> plot.hist from PySpark throws AttributeError exception when pyspark.pandas is used with pandas=1.3.5.
> Pandas in commit [https://github.com/pandas-dev/pandas/commit/029907c9d69a0260401b78a016a6c4515d8f1c40]
> replaced MPLPlot._add_legend_handle with MPLPlot._append_legend_handles_labels.
> I've attached PR on github which replaces use of MPLPlot._add_legend_handle in PySpark with MPLPlot._append_legend_handles_labels.
>  
> Code:
>  
> {code:java}
> import pyspark.pandas as ps
> from matplotlib import pyplot as plt
> ps.set_option("plotting.backend", "matplotlib")
> df = ps.DataFrame({'data': [4, 5, 5, 6, 8, 9]})
> df['data'].plot.hist()
> plt.show()
>  {code}
>  
>  
> Truncated traceback:
> {code:java}
> Traceback (most recent call last):                                              
>   File "/home/develop/Documents/sparkbug/code.py", line 6, in <module>
>     df['data'].plot.hist()
>   ...
>   File "/mnt/transient/develop/miniconda3/envs/testenv/lib/python3.9/site-packages/pyspark/pandas/plot/matplotlib.py", line 403, in _make_plot
>     self._add_legend_handle(artists[0], label, index=i)
> AttributeError: 'PandasOnSparkHistPlot' object has no attribute '_add_legend_handle' {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org