You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/09/05 06:33:07 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #42814: [DO-NOT-MERGE][POC] Separate PySpark distribution into pyspark, pyspark_common and pyspark_connect

HyukjinKwon opened a new pull request, #42814:
URL: https://github.com/apache/spark/pull/42814

   ### What changes were proposed in this pull request?
   
   This PR proposes to separate PySpark distribution into `pyspark`, `pyspark_common` and `pyspark_connect`.
   End users would be able to do as follows:
   - `pip install pyspark`: Use the same namespace. Users can use PySpark with and without Spark Connect
   - `pip install pyspark pyspark_common --no-deps`: Use the same namespace. Users can only use non Spark Connect.
   - `pip install pyspark_connect` : Use the different namespace. Users can only use Spark Connect.
   - `pip install pyspark_common`: Use the different namespace. Users can use the common data types such as `Row` via `from pyspark_common.sql.types import Row`.
   
   ### Why are the changes needed?
   
   - In order to allow pure Python library with Spark Connect.
   - In order to allow the end users the option to exclude/include Spark Connect.
   
   ### Does this PR introduce _any_ user-facing change?
   
   - No, `pip install pyspark` continues working.
   - Existing importing also continues to work, e.g., `from pyspark import errors`.
   
   ### How was this patch tested?
   
   - Testing `pyspark`
       ```bash
       ./sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
       cd python
       ```
   
       ```python
       from pyspark.sql import SparkSession
       SparkSession.builder.getOrCreate().range(10).show()
       SparkSession.builder.getOrCreate().stop()
       SparkSession.builder.remote("sc://localhost:15002").getOrCreate().range(10).show()
       SparkSession.builder.getOrCreate().stop()
       ```
   
   - Testing `pyspark_common`
   
       ```bash
       cd python
       ```
   
       ```python
       from pyspark_common.sql.types import Row
       Row(a=1)
       ```
   
   - Testing `pyspark_connect`
   
       ```bash
       ./sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
       cd python
       ```
   
       ```python
       from pyspark_connect.sql.session import SparkSession
       SparkSession.builder.remote("sc://localhost:15002").getOrCreate().range(10).show()
       ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE][POC] Separate PySpark distribution into pyspark, pyspark_common and pyspark_connect [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #42814: [DO-NOT-MERGE][POC] Separate PySpark distribution into pyspark, pyspark_common and pyspark_connect
URL: https://github.com/apache/spark/pull/42814


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE][POC] Separate PySpark distribution into pyspark, pyspark_common and pyspark_connect [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #42814:
URL: https://github.com/apache/spark/pull/42814#issuecomment-1858632385

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org