You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/01 16:07:22 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

grundprinzip opened a new pull request, #38470:
URL: https://github.com/apache/spark/pull/38470

   ### What changes were proposed in this pull request?
   
   This patch adds documentation to describe how clients should implement handling 
   connecting to the Spark Connect endpoint. GRPC as a protocol is well documented and has
   many options. However, this does not make it easy for users to reason about how to correctly
   configure GRPC to make it work for their use cases. 
   
   To overcome the issue, this document defines a client connection string that needs to be parsed
   by the different language clients and can be used to properly configure the GRPC client.
   
   
   ### Why are the changes needed?
   
   Documentation and design specification for clients implementing the Spark Connect protocol.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Doc only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38470:
URL: https://github.com/apache/spark/pull/38470#discussion_r1011045449


##########
connector/connect/doc/client_connection_string.md:
##########
@@ -0,0 +1,110 @@
+# Connecting to Spark Connect using Clients

Review Comment:
   The usage documentation should better be in https://github.com/apache/spark/tree/master/docs as an md file so it can be properly documented. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299609859

   @HyukjinKwon I will add a Jira this is just the starting point to align where we want to go. 
   
   My idea would be that once this is merged I will create a pr for the python client to follow this proposal and then we can update this doc with an e2e example. 
   
   But the main goal here is developer documentation not user doc. For example, scala must then implement the same model. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1301480736

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299624291

   For developer documentation, it might better be placed under sources as a comment e.g., `packages.scala`. e.g.) https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L40-L123
   
   Or. maybe README.md at the top level of the compoenent https://github.com/apache/spark/blob/master/hadoop-cloud/README.md or https://github.com/apache/spark/blob/master/connector/docker/README.md
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299629632

   What about we link to it from the top level Readme in the component?
   
   The reason why it's not in the code is because it's client language agnostic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String
URL: https://github.com/apache/spark/pull/38470


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299418917

   Maybe it's better to have a JIRA. BTW, wonder if we have an e2e example for users can copy and paste to try. (e.g., like most of docs in https://spark.apache.org/docs/latest/index.html). Another decision to make is if we should document it in PySpark docs (https://spark.apache.org/docs/latest/api/python/getting_started/index.html) or Spark main page (https://spark.apache.org/docs/latest/index.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1301523665

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299977630

   @HyukjinKwon so my proposal is the following: 
   
    * I moved the README from the pyspark folder into the top-level `connector/connect` directory and made it clear that these are developer docs.
    * The `doc` directory was moved to `docs` to be consistent with the project
    * Linked from the top-level README to the connection string configuration doc.
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299054036

   Overall LGTM
   
   Is the `user_id` (or the user session token) be relevant to this doc?https://github.com/apache/spark/blob/8f6b18536e44ffd36656ceb56a434e399ad6d1b8/python/pyspark/sql/connect/client.py#L107
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299084859

   Good point, I will incorporate that into the doc. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299633920

   Maybe putting it to the top model level (`connector/connect/README.md`) for now could be a good idea (?). Just wanted to avoid a different structure compared to other compoenents (`connector/connect/doc`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org