You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/07 16:41:53 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

grundprinzip opened a new pull request, #38535:
URL: https://github.com/apache/spark/pull/38535

   ### What changes were proposed in this pull request?
   
   This patch provides small changes to the way the `user_id` is propagated in the SparkRemoteSession. Previously, the `user_id` was a required parameter, this is changed and it's inferred from one of the following configuration options in this order:
   
   1. The `user_id` is specified as a parameter to the connection string.
   2. The `user_id` is specified as a parameter to the `RemoteSparkSession`.
   3. If not defined it is inferred from the `$USER`
   
   In addition, since `token` can only be used for secure connections, it will implicitly set the `use_ssl` property of the connection.
   
   ### Why are the changes needed?
   Usability
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38535: [SPARK-41001][CONNECT] Make `userId` optional in SparkRemoteSession

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38535:
URL: https://github.com/apache/spark/pull/38535#issuecomment-1311170098

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1019656175


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -195,7 +195,15 @@ def test_invalid_connection_strings(self):
         for i in invalid:
             self.assertRaises(AttributeError, ChannelBuilder, i)
 
-        self.assertRaises(AttributeError, ChannelBuilder("sc://host/;token=123").to_channel)
+    def test_sensible_defaults(self):
+        chan = ChannelBuilder("sc://host")

Review Comment:
   I might has missed your previous discussion:
   
   why using, for example, `http` which is not a true http protocol in the underlying implementation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018850304


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -195,7 +195,15 @@ def test_invalid_connection_strings(self):
         for i in invalid:
             self.assertRaises(AttributeError, ChannelBuilder, i)
 
-        self.assertRaises(AttributeError, ChannelBuilder("sc://host/;token=123").to_channel)
+    def test_sensible_defaults(self):
+        chan = ChannelBuilder("sc://host")

Review Comment:
   we already discussed but just as a reminder .. would better change `sc` to others ... like `connect` or `http` .. anything else (in a separate PR).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1015890739


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`
+        user_id : Optional[str]
+            Optional unique user ID that is used to differentiate multiple users and
+            isolate their Spark Sessions. If the `user_id` is not set, will default to
+            the $USER environment. Defining the user ID as part of the connection string
+            takes precedence.
         """
-
         # Parse the connection string.
         self._builder = ChannelBuilder(connection_string)
-        self._user_id = user_id
+        self._user_id = None
+        if self._builder.user_id is not None:
+            self._user_id = self._builder.user_id
+        elif user_id is not None:
+            self._user_id = user_id
+        else:
+            self._user_id = os.getenv("USER", None)

Review Comment:
   Yeah I was thinking about this: user can explicitly use other's session. 
   
   This though gives implicit way that people might not aware but that might be rare. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018848047


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`
+        user_id : Optional[str]
+            Optional unique user ID that is used to differentiate multiple users and
+            isolate their Spark Sessions. If the `user_id` is not set, will default to
+            the $USER environment. Defining the user ID as part of the connection string
+            takes precedence.
         """
-
         # Parse the connection string.
         self._builder = ChannelBuilder(connection_string)
-        self._user_id = user_id
+        self._user_id = None
+        if self._builder.user_id is not None:
+            self._user_id = self._builder.user_id
+        elif user_id is not None:
+            self._user_id = user_id
+        else:
+            self._user_id = os.getenv("USER", None)

Review Comment:
   Should we probably match this to `SparkContext.sparkUser` (or at least make it similar)? e.g., SPARK_USER env



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018850304


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -195,7 +195,15 @@ def test_invalid_connection_strings(self):
         for i in invalid:
             self.assertRaises(AttributeError, ChannelBuilder, i)
 
-        self.assertRaises(AttributeError, ChannelBuilder("sc://host/;token=123").to_channel)
+    def test_sensible_defaults(self):
+        chan = ChannelBuilder("sc://host")

Review Comment:
   we already discussed but just as a reminder .. would better change `sc` to others ... like `connect`, `cnt`, or `http` .. anything else (in a separate PR).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018891531


##########
python/pyspark/sql/connect/client.py:
##########
@@ -125,13 +126,30 @@ def metadata(self) -> typing.Iterable[typing.Tuple[str, str]]:
 
     @property
     def secure(self) -> bool:
+        if self._token is not None:
+            return True
+
         value = self.params.get(ChannelBuilder.PARAM_USE_SSL, "")
         return value.lower() == "true"
 
     @property
     def endpoint(self) -> str:
         return f"{self.host}:{self.port}"
 
+    @property
+    def _token(self) -> Optional[str]:
+        return self.params.get(ChannelBuilder.PARAM_TOKEN, None)
+
+    @property
+    def user_id(self) -> Optional[str]:

Review Comment:
   I changed it please have a look.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38535:
URL: https://github.com/apache/spark/pull/38535#issuecomment-1308229533

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1015885425


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`
+        user_id : Optional[str]
+            Optional unique user ID that is used to differentiate multiple users and
+            isolate their Spark Sessions. If the `user_id` is not set, will default to
+            the $USER environment. Defining the user ID as part of the connection string
+            takes precedence.
         """
-
         # Parse the connection string.
         self._builder = ChannelBuilder(connection_string)
-        self._user_id = user_id
+        self._user_id = None
+        if self._builder.user_id is not None:
+            self._user_id = self._builder.user_id
+        elif user_id is not None:
+            self._user_id = user_id
+        else:
+            self._user_id = os.getenv("USER", None)

Review Comment:
   Yes, but this is today already the case if you manually specify twice the same user ID.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1015890739


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`
+        user_id : Optional[str]
+            Optional unique user ID that is used to differentiate multiple users and
+            isolate their Spark Sessions. If the `user_id` is not set, will default to
+            the $USER environment. Defining the user ID as part of the connection string
+            takes precedence.
         """
-
         # Parse the connection string.
         self._builder = ChannelBuilder(connection_string)
-        self._user_id = user_id
+        self._user_id = None
+        if self._builder.user_id is not None:
+            self._user_id = self._builder.user_id
+        elif user_id is not None:
+            self._user_id = user_id
+        else:
+            self._user_id = os.getenv("USER", None)

Review Comment:
   Yeah I was thinking about this: user can explicitly use other's session. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1019664007


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -195,7 +195,15 @@ def test_invalid_connection_strings(self):
         for i in invalid:
             self.assertRaises(AttributeError, ChannelBuilder, i)
 
-        self.assertRaises(AttributeError, ChannelBuilder("sc://host/;token=123").to_channel)
+    def test_sensible_defaults(self):
+        chan = ChannelBuilder("sc://host")

Review Comment:
   sounds good then



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #38535:
URL: https://github.com/apache/spark/pull/38535#issuecomment-1306176335

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38535: [SPARK-41001][CONNECT] Make `userId` optional in SparkRemoteSession

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38535:
URL: https://github.com/apache/spark/pull/38535#issuecomment-1311169723

   There's an existing comment https://github.com/apache/spark/pull/38535#discussion_r1018848047. But could handle it separately. Let me merge this in first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018846038


##########
python/pyspark/sql/connect/client.py:
##########
@@ -125,13 +126,30 @@ def metadata(self) -> typing.Iterable[typing.Tuple[str, str]]:
 
     @property
     def secure(self) -> bool:
+        if self._token is not None:
+            return True
+
         value = self.params.get(ChannelBuilder.PARAM_USE_SSL, "")
         return value.lower() == "true"
 
     @property
     def endpoint(self) -> str:
         return f"{self.host}:{self.port}"
 
+    @property
+    def _token(self) -> Optional[str]:
+        return self.params.get(ChannelBuilder.PARAM_TOKEN, None)
+
+    @property
+    def user_id(self) -> Optional[str]:

Review Comment:
   should probably follow camelCase ..  Are these a lot to fix? Wonder if we can fix them here together if there aren't.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018868868


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -195,7 +195,15 @@ def test_invalid_connection_strings(self):
         for i in invalid:
             self.assertRaises(AttributeError, ChannelBuilder, i)
 
-        self.assertRaises(AttributeError, ChannelBuilder("sc://host/;token=123").to_channel)
+    def test_sensible_defaults(self):
+        chan = ChannelBuilder("sc://host")

Review Comment:
   yes, i will change it to "http" makes the parsing above easier as well, but needs a separate PR because it needs to update the dev doc as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38535: [SPARK-41001][CONNECT] Make `userId` optional in SparkRemoteSession

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38535: [SPARK-41001][CONNECT] Make `userId` optional in SparkRemoteSession
URL: https://github.com/apache/spark/pull/38535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1019663271


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -195,7 +195,15 @@ def test_invalid_connection_strings(self):
         for i in invalid:
             self.assertRaises(AttributeError, ChannelBuilder, i)
 
-        self.assertRaises(AttributeError, ChannelBuilder("sc://host/;token=123").to_channel)
+    def test_sensible_defaults(self):
+        chan = ChannelBuilder("sc://host")

Review Comment:
   GRPC uses http2 so HTTP is most likely the most relevant schema but so could be h2. 
   
   The reason to pick http is that it aligns nicely with what customers and users might expect. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018848047


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`
+        user_id : Optional[str]
+            Optional unique user ID that is used to differentiate multiple users and
+            isolate their Spark Sessions. If the `user_id` is not set, will default to
+            the $USER environment. Defining the user ID as part of the connection string
+            takes precedence.
         """
-
         # Parse the connection string.
         self._builder = ChannelBuilder(connection_string)
-        self._user_id = user_id
+        self._user_id = None
+        if self._builder.user_id is not None:
+            self._user_id = self._builder.user_id
+        elif user_id is not None:
+            self._user_id = user_id
+        else:
+            self._user_id = os.getenv("USER", None)

Review Comment:
   Should we probably match this to `SparkContext.sparkUser`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1018869517


##########
python/pyspark/sql/connect/client.py:
##########
@@ -125,13 +126,30 @@ def metadata(self) -> typing.Iterable[typing.Tuple[str, str]]:
 
     @property
     def secure(self) -> bool:
+        if self._token is not None:
+            return True
+
         value = self.params.get(ChannelBuilder.PARAM_USE_SSL, "")
         return value.lower() == "true"
 
     @property
     def endpoint(self) -> str:
         return f"{self.host}:{self.port}"
 
+    @property
+    def _token(self) -> Optional[str]:
+        return self.params.get(ChannelBuilder.PARAM_TOKEN, None)
+
+    @property
+    def user_id(self) -> Optional[str]:

Review Comment:
   I can fix this here (still struggling to get used to camelCase method names) :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #38535:
URL: https://github.com/apache/spark/pull/38535#issuecomment-1310963175

   @HyukjinKwon please take another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1015876295


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`
+        user_id : Optional[str]
+            Optional unique user ID that is used to differentiate multiple users and
+            isolate their Spark Sessions. If the `user_id` is not set, will default to
+            the $USER environment. Defining the user ID as part of the connection string
+            takes precedence.
         """
-
         # Parse the connection string.
         self._builder = ChannelBuilder(connection_string)
-        self._user_id = user_id
+        self._user_id = None
+        if self._builder.user_id is not None:
+            self._user_id = self._builder.user_id
+        elif user_id is not None:
+            self._user_id = user_id
+        else:
+            self._user_id = os.getenv("USER", None)

Review Comment:
   ahh I am a bit conservative here: 
   
   Is it possible that two independent devices have the same  "USER", which in turn make two devices share the same SparkSession remotely without awareness? 



##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`

Review Comment:
   typo: `sc://localhost`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1015885676


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`

Review Comment:
   ```suggestion
               the GRPC connection. Defaults to `sc://localhost`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001][CONNECT] Make `userId` optional in SparkRemoteSession

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38535:
URL: https://github.com/apache/spark/pull/38535#discussion_r1020034764


##########
python/pyspark/sql/connect/client.py:
##########
@@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult":
 class RemoteSparkSession(object):
     """Conceptually the remote spark session that communicates with the server"""
 
-    def __init__(self, user_id: str, connection_string: str = "sc://localhost"):
+    def __init__(self, connection_string: str = "sc://localhost", user_id: Optional[str] = None):
         """
         Creates a new RemoteSparkSession for the Spark Connect interface.
 
         Parameters
         ----------
-        user_id : str
-            Unique User ID that is used to differentiate multiple users and
-            isolate their Spark Sessions.
-        connection_string: str
+        connection_string: Optional[str]
             Connection string that is used to extract the connection parameters and configure
-            the GRPC connection.
+            the GRPC connection. Defaults to `sc://localhos`
+        user_id : Optional[str]
+            Optional unique user ID that is used to differentiate multiple users and
+            isolate their Spark Sessions. If the `user_id` is not set, will default to
+            the $USER environment. Defining the user ID as part of the connection string
+            takes precedence.
         """
-
         # Parse the connection string.
         self._builder = ChannelBuilder(connection_string)
-        self._user_id = user_id
+        self._user_id = None
+        if self._builder.user_id is not None:
+            self._user_id = self._builder.user_id
+        elif user_id is not None:
+            self._user_id = user_id
+        else:
+            self._user_id = os.getenv("USER", None)

Review Comment:
   This is on the client side where there is no `SparkContext`. I think in a follow up we can potentially use an additional env var called `SPARK_USER` if necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org