You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/01 00:50:13 UTC

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38055: [SPARK-40590][TEST] Fix `ps.read_parquet` when `pandas_metadata` is `True`

HyukjinKwon commented on code in PR #38055:
URL: https://github.com/apache/spark/pull/38055#discussion_r985016033


##########
python/pyspark/pandas/tests/test_dataframe_spark_io.py:
##########
@@ -96,11 +97,18 @@ def test_parquet_read_with_pandas_metadata(self):
             self.assert_eq(ps.read_parquet(path2, pandas_metadata=True), expected2)
 
             expected3 = expected2.set_index("index", append=True)
+            # There is a bug in `to_parquet` from pandas 1.5.0 when writing MultiIndex.
+            # See https://github.com/pandas-dev/pandas/issues/48848 for the reported issue.
+            if LooseVersion(pd.__version__) >= LooseVersion("1.5.0"):

Review Comment:
   ```suggestion
               if LooseVersion(pd.__version__) == LooseVersion("1.5.0"):
   ```
   
   maybe .. since it's going to be fixed in 1.5.1 presumably.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org