You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2023/08/06 01:25:00 UTC

[jira] [Resolved] (SPARK-44670) Fix the `test_to_excel` tests for python3.7

     [ https://issues.apache.org/jira/browse/SPARK-44670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-44670.
----------------------------------
    Resolution: Fixed

Issue resolved by pull request 42339
[https://github.com/apache/spark/pull/42339]

> Fix the `test_to_excel` tests for python3.7
> -------------------------------------------
>
>                 Key: SPARK-44670
>                 URL: https://issues.apache.org/jira/browse/SPARK-44670
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 3.4.1
>            Reporter: Madhukar
>            Assignee: Madhukar
>            Priority: Minor
>             Fix For: 3.4.2
>
>
> With python3.7 and openpyxl installed got error:
> ======================================================================
> ERROR: test_to_excel (pyspark.pandas.tests.test_dataframe_conversion.DataFrameConversionTest)
> Traceback (most recent call last):
>   File "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", line 102, in test_to_excel
>     dataframes = self.get_excel_dfs(pandas_on_spark_location, pandas_location)
>   File "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", line 89, in get_excel_dfs
>     "got": pd.read_excel(pandas_on_spark_location, index_col=0),
>   File "/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py", line 296, in wrapper
>     return func(*args, **kwargs)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
>     io = ExcelFile(io, engine=engine)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 867, in __init__
>     self._reader = self._engines[engine](self._io)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 21, in __init__
>     import_optional_dependency("xlrd", extra=err_msg)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/compat/_optional.py", line 110, in import_optional_dependency
>     raise ImportError(msg) from None
> ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
> ----------------------------------------------------------------------
>  
>  
>  
> But with xlrd 2.0.1 installed getting error
> ======================================================================
> ERROR: test_to_excel (pyspark.pandas.tests.test_dataframe_conversion.DataFrameConversionTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", line 102, in test_to_excel
>     dataframes = self.get_excel_dfs(pandas_on_spark_location, pandas_location)
>   File "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", line 89, in get_excel_dfs
>     "got": pd.read_excel(pandas_on_spark_location, index_col=0),
>   File "/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py", line 296, in wrapper
>     return func(*args, **kwargs)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
>     io = ExcelFile(io, engine=engine)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 867, in __init__
>     self._reader = self._engines[engine](self._io)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 22, in __init__
>     super().__init__(filepath_or_buffer)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 353, in __init__
>     self.book = self.load_workbook(filepath_or_buffer)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 37, in load_workbook
>     return open_workbook(filepath_or_buffer)
>   File "/opt/conda/lib/python3.7/site-packages/xlrd/__init__.py", line 170, in open_workbook
>     raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
> xlrd.biffh.XLRDError: Excel xlsx file; not supported
> ----------------------------------------------------------------------
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org