You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dongjoon-hyun (via GitHub)" <gi...@apache.org> on 2023/09/29 22:03:16 UTC

[GitHub] [spark] dongjoon-hyun opened a new pull request, #43184: [SPARK-44120][PYTHON] Support Python 3.12

dongjoon-hyun opened a new pull request, #43184:
URL: https://github.com/apache/spark/pull/43184

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1341951003


##########
python/pyspark/pandas/plot/matplotlib.py:
##########
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-from distutils.version import LooseVersion
+from packaging.version import Version

Review Comment:
   Yes, currently, it should be installed manually like Numpy. 
   
   However, it's independent from supporting Python 3.12 itself.
   
   Can we do that porting after this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741672379

   It seems that some side-effects on the installation. 
   ```
   [info] org.apache.spark.sql.SQLQueryTestSuite *** ABORTED *** (6 milliseconds)
   [info]   java.lang.RuntimeException: Python executable [python3] and/or pyspark are unavailable.
   ```
   
   Let me check this first. I converted this PR to the `Draft` for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741747901

   Hi, @HyukjinKwon . Could you review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1342032967


##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -17,7 +17,7 @@
 
 import inspect
 import unittest

Review Comment:
   It's not difficult to copy or to reimplement, @HyukjinKwon and @viirya . 
   
   However, it's a PSF License which is different from Py4J or cloudpickle (under BSD 3-Clause), 
   https://github.com/apache/spark/blob/58c24a5719b8717ea37347c668c9df8a3714ae3c/LICENSE-binary#L460-L462
   
   I believe it's compatible but we need to take a look once more before doing that. It's because our Apache Spark binary distribution doesn't include `Python Software Foundation` yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741919654

   According to the review comment, I made a separate PR, @HyukjinKwon and @viirya 
   - #43192


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1341951167


##########
python/pyspark/pandas/plot/matplotlib.py:
##########
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-from distutils.version import LooseVersion
+from packaging.version import Version

Review Comment:
   I can proceed to port `packaging` tomorrow separately too because USA doesn't have Chuseok, @HyukjinKwon . :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741846176

   > Actually, `pyspark` shell fails to start. So, we need to embed `packaging` like Py4J.
   > 
   > >
   
   From the error shown in the description for `distutils`, I guess you mean `pyspark` shell fails to start at same location?
   
   ```
       from pyspark.sql.pandas.conversion import PandasConversionMixin
     File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/pandas/conversion.py", line 29, in <module>
       from distutils.version import LooseVersion
   ModuleNotFoundError: No module named 'distutils'
   ```
   
   This is only triggerd if pandas/pyarrow is installed/enabled. As you can see it is under `sql/pandas`, if you don't have pandas installed, I think it won't run into there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741850670

   Let me follow your advice, @viirya . I'm trying to add a conditional import on `./python/pyspark/sql/pandas/conversion.py`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12
URL: https://github.com/apache/spark/pull/43184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1342032967


##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -17,7 +17,7 @@
 
 import inspect
 import unittest

Review Comment:
   It's not difficult to copy or to reimplement, @HyukjinKwon and @viirya . 
   
   However, it's a PSF License which is different from Py4J or cloudpickle (under BSD 3-Clause), 
   https://github.com/apache/spark/blob/58c24a5719b8717ea37347c668c9df8a3714ae3c/LICENSE-binary#L460-L462
   
   I believe it's compatible but we need to take a look once more before doing that. It's because our Apache Spark (up to 3.5.0) binary distribution doesn't include `Python Software Foundation` yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1341951167


##########
python/pyspark/pandas/plot/matplotlib.py:
##########
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-from distutils.version import LooseVersion
+from packaging.version import Version

Review Comment:
   I can proceed to port `packaging` tomorrow too because USA doesn't have Chuseok, @HyukjinKwon . :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741637748

   All Python tests passed.
   
   <img width="215" alt="Screenshot 2023-09-29 at 7 47 11 PM" src="https://github.com/apache/spark/assets/9700541/709a8ba7-46cf-4c40-809f-8fa0fbdd6077">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741834732

   Could you review this PR, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1341950311


##########
python/pyspark/pandas/plot/matplotlib.py:
##########
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-from distutils.version import LooseVersion
+from packaging.version import Version

Review Comment:
   I can take a look around (mid) next week too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1342008235


##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -17,7 +17,7 @@
 
 import inspect
 import unittest

Review Comment:
   It's my phone so please ignore this comment if it's wrong. If we use distutils only for LooseVersion, we could just have our own LooseVersion in PySpark too (instead of embedding the package)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1342049909


##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -17,7 +17,7 @@
 
 import inspect
 import unittest

Review Comment:
   We could just write our own instead of copying. It's just a version string check in the end. I have seen their code yet so I can write one on my own too :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741849856

   Although you are right, it should work in that way. However, the technical problem is that Apache PySpark code doesn't have a conditional check in SparkSession properly. Here is the error message when we don't have both pandas and packaging package.
   ```
   $ bin/pyspark
   Python 3.12.0rc2 (main, Sep 21 2023, 21:22:29) [Clang 14.0.0 (clang-1400.0.28.1)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   Traceback (most recent call last):
     File "/Users/dongjoon/PRS/SPARK-44120/python/pyspark/shell.py", line 31, in <module>
       import pyspark
     File "/Users/dongjoon/PRS/SPARK-44120/python/pyspark/__init__.py", line 148, in <module>
       from pyspark.sql import SQLContext, HiveContext, Row  # noqa: F401
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/Users/dongjoon/PRS/SPARK-44120/python/pyspark/sql/__init__.py", line 43, in <module>
       from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration, UDTFRegistration
     File "/Users/dongjoon/PRS/SPARK-44120/python/pyspark/sql/context.py", line 39, in <module>
       from pyspark.sql.session import _monkey_patch_RDD, SparkSession
     File "/Users/dongjoon/PRS/SPARK-44120/python/pyspark/sql/session.py", line 47, in <module>
       from pyspark.sql.dataframe import DataFrame
     File "/Users/dongjoon/PRS/SPARK-44120/python/pyspark/sql/dataframe.py", line 64, in <module>
       from pyspark.sql.pandas.conversion import PandasConversionMixin
     File "/Users/dongjoon/PRS/SPARK-44120/python/pyspark/sql/pandas/conversion.py", line 29, in <module>
       from packaging.version import Version
   ModuleNotFoundError: No module named 'packaging'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1342049909


##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -17,7 +17,7 @@
 
 import inspect
 import unittest

Review Comment:
   We could just write our own instead of copying. It's just a version string check in the end. I have not seen their code yet so I can write one on my own too :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1341950255


##########
python/pyspark/pandas/plot/matplotlib.py:
##########
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-from distutils.version import LooseVersion
+from packaging.version import Version

Review Comment:
   Oh actually we can't just add a required dependency easily. It works with `pip install` but it wouldn't work when users download Spark from the official website (as they have to manually install `packaging` dependency in their nodes) - in case of Py4J, we already contain that in our release so it works.
   
   We should either port `packaging` into our release, or find a workaround to avoid adding the required dependency.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a diff in pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43184:
URL: https://github.com/apache/spark/pull/43184#discussion_r1342009706


##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -17,7 +17,7 @@
 
 import inspect
 import unittest

Review Comment:
   It also sounds good if `LooseVersion` is easy to port into Spark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741647052

   Could you review this, @LuciferYang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1742154344

   > Do you want to run Python 3.12 test in CI?
   
   Not yet. (1) Python 3.12 is not released yet. It will be released Tomorrow. We can add Python 3.12 to `actions/setup-python` GitHub Action CI later. (2) We may need to add a separate daily pipeline.
   - https://www.python.org/downloads/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1741845399

   Thank you for review, @viirya .
   
   You are right. We consider them optional, @viirya . 
   > So for users who download Spark from the official website, they are required to install these dependencies by themselves, right?
   
   Actually, `pyspark` shell fails to start. So, we need to embed like Py4J.
   > I suppose that you can run PySpark from downloaded distribution without packaging if not touching connect and pandas functions.
   
   https://github.com/apache/spark/blob/master/python/lib/py4j-0.10.9.7-src.zip
   
   However, embedding is not a recommendation from the official Python community. So, I didn't do that in this PR. I'll handle that usability issue as an independent JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43184: [SPARK-44120][PYTHON] Support Python 3.12

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43184:
URL: https://github.com/apache/spark/pull/43184#issuecomment-1742179536

   Thank you, @viirya and all.
   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org