You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2024/01/16 01:43:32 UTC

[PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

itholic opened a new pull request, #44745:
URL: https://github.com/apache/spark/pull/44745

   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to check Pandas installation properly
   
   ### Why are the changes needed?
   
   Checking Pandas installation is not working correctly, but raising improper exception when Pandas is not installed.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No API change, but user-facing error message is now showing proper error message to guide:
   
   **Before**
   ```
   >>> import pyspark.pandas
   AttributeError: module 'pandas' has no attribute '__version__'
   ```
   
   **After**
   ```
   >>> import pyspark.pandas
   pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] Pandas >= 1.4.4 must be installed; however, it was not found.
   ```
   
   ### How was this patch tested?
   
   Manually tested
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1907201155

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894691125

   I roughly suspect that this happened due to the same package names in our project here and there (such as `pyspark.pandas`, `pyspark.sql.pandas`), so the namespace conflicts issue occur for some reason, but could not figure out the actual root cause right now. I can do some deeper investigation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44745:
URL: https://github.com/apache/spark/pull/44745#discussion_r1452925274


##########
python/pyspark/sql/pandas/utils.py:
##########
@@ -27,7 +27,11 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None

Review Comment:
   Sure. Adjusted comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894687602

   @itholic can you actually check why this happens only in pandas though?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #44745: [SPARK-46728][PYTHON] Check Pandas installation properly
URL: https://github.com/apache/spark/pull/44745


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894707004

   > It'd be great if we can at least googling and it only happens in pandas before merging this.
   
   Yeah, I googled when I submitting this PR, but unfortunately couldn't figure out any clue. Let me have some more investigation today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44745:
URL: https://github.com/apache/spark/pull/44745#discussion_r1452863818


##########
python/pyspark/sql/pandas/utils.py:
##########
@@ -27,7 +27,11 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None

Review Comment:
   Oh,, but this is reproduced when I run it in PySpark shell:
   
   **No Pandas**
   ```
   ./bin/pyspark
   import pyspark.pandas
   # pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] Pandas >= 1.4.4 must be installed; however, it was not found.
   ```
   
   **Old Pandas**
   ```
   ./bin/pyspark
   import pyspark.pandas
   # pyspark.errors.exceptions.base.PySparkImportError: [UNSUPPORTED_PACKAGE_VERSION] Pandas >= 1.4.4 must be installed; however, your version is 1.4.0.
   ```
   
   Maybe this is only my local issue? Did you happen to check if you tested after uninstalling Pandas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic closed pull request #44745: [SPARK-46728][PYTHON] Check Pandas installation properly
URL: https://github.com/apache/spark/pull/44745


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44745:
URL: https://github.com/apache/spark/pull/44745#discussion_r1452849859


##########
python/pyspark/sql/pandas/utils.py:
##########
@@ -27,7 +27,11 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None

Review Comment:
   It works to me without the fix:
   
   ```
   >>> import pyspark.pandas
   ...
   >>>
   ```
   
   The problem is when you run the tests in PyCharm, it tries to import `pandas` under `pyspark`. I think we should find another way to work around this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44745:
URL: https://github.com/apache/spark/pull/44745#discussion_r1452872993


##########
python/pyspark/sql/pandas/utils.py:
##########
@@ -27,7 +27,11 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None

Review Comment:
   I think it's because removed pandas isn't fully actually removed.  e.g., in my case `/.../miniconda3/envs/python3.11/lib/python3.11/site-packages/pandas` directory is still there, and my Python thinks the pandas is still there but it's an empty package.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894850179

   It seems like if there are extension packages that use parts of the package we're trying to remove, `pip uninstall` adds those dependencies to the `Would not remove` list and doesn't actually remove them.
   
   In our case, `pip uninstall` prints out `Would not remove` list and doesn't actually remove them when `pandas-stubs` is still installed, and that's why `pandas` is not completely removed:
   
   ```
   (pyspark-dev-env) haejoon.lee@NQ679Q495F spark % pip uninstall pandas
   Found existing installation: pandas 2.1.4
   Uninstalling pandas-2.1.4:
     Would remove:
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas-2.1.4.dist-info/*
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas/*
     Would not remove (might be manually added):
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas/__init__.pyi
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas/_config/__init__.pyi
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas/_config/config.pyi
       ...
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas/util/_tester.pyi
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas/util/_validators.pyi
       /Users/haejoon.lee/anaconda3/envs/pyspark-dev-env/lib/python3.9/site-packages/pandas/util/testing.pyi
   ```
   
   So, to completely remove `pandas`, we need to uninstall `pandas-stubs` first and then uninstall `pandas`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1907200103

   Hmm... actually I just noticed that this harms dev testability for some case such as https://github.com/apache/spark/pull/44778, so I think maybe we better bandaid this?? WDYT @HyukjinKwon @dongjoon-hyun ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894863725

   On second thought, this issue seems like a corner case according to https://github.com/apache/spark/pull/44745#issuecomment-1894850179. Both `pandas` and `pandas-stubs` are specified as required packages, so I think it would be a good idea not to deal with problems caused by deleting required packages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894859579

   Updated PR description and comment accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894686764

   > Could you do the same things for the other packages like PyArrow, @itholic ?
   
   Sure. I just confirmed that other packages work as expected without any changes unlike Pandas (e.g. PyArrow)
   
   ```python
   >>> import pyspark.pandas
   pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] PyArrow >= 4.0.0 must be installed; however, it was not found.
   ```
   
   Do you have any other packages reproduce the same issue such as Pandas?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44745:
URL: https://github.com/apache/spark/pull/44745#discussion_r1452870616


##########
python/pyspark/sql/pandas/utils.py:
##########
@@ -27,7 +27,11 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None

Review Comment:
   Oh, okay I face the same issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1907200860

   okay, let's go ahead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894688648

   My concern is that, this is sort of a hacky bandaid fix. It is a bit weird that we do this only for pandas without knowing what's exactly going on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894864116

   Let me close this PR for now, but please feel free to ping me if there is any other opinions!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44745:
URL: https://github.com/apache/spark/pull/44745#issuecomment-1894705810

   > I roughly suspect that this happened due to the same package names in our project here and there (such as pyspark.pandas, pyspark.sql.pandas), so the namespace conflicts issue occur for some reason only in pandas, but could not figure out the actual root cause right now.
   
   This one I know because the test fails sometimes with IDE for the reason.
   
   > The reason why I suspect in this way is that because the path /../site-packages/pandas is only not deleted clearly with PySpark dev env when uninstalling pandas.
   
   This one can also happen in other packages as well. If that's the case, we should also address the same thing in other packages, e.g., pandas udf and spark connect. It'd be great if we can at least googling and it only happens in pandas before merging this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44745:
URL: https://github.com/apache/spark/pull/44745#discussion_r1452863818


##########
python/pyspark/sql/pandas/utils.py:
##########
@@ -27,7 +27,11 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None

Review Comment:
   Oh,, but this is reproduced when I run it in PySpark shell:
   
   **No Pandas**
   ```
   ./bin/pyspark
   import pyspark.pandas
   # pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] Pandas >= 1.4.4 must be installed; however, it was not found.
   ```
   
   **Old Pandas**
   ```
   ./bin/pyspark
   import pyspark.pandas
   # pyspark.errors.exceptions.base.PySparkImportError: [UNSUPPORTED_PACKAGE_VERSION] Pandas >= 1.4.4 must be installed; however, your version is 1.4.0.
   ```
   
   Maybe this is only my local issue? Could you happen to check if you tested after uninstalling Pandas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44745:
URL: https://github.com/apache/spark/pull/44745#discussion_r1452873569


##########
python/pyspark/sql/pandas/utils.py:
##########
@@ -27,7 +27,11 @@ def require_minimum_pandas_version() -> None:
     try:
         import pandas
 
-        have_pandas = True
+        if hasattr(pandas, "__version__"):
+            have_pandas = True
+        else:
+            have_pandas = False
+            raised_error = None

Review Comment:
   Can we explain it in PR description, and add a comment please



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org