You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/29 10:39:43 UTC

[GitHub] [spark] dchvn opened a new pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

dchvn opened a new pull request #34439:
URL: https://github.com/apache/spark/pull/34439


   ### What changes were proposed in this pull request?
   Inline type hints for python/pyspark/broadcast.py
   ### Why are the changes needed?
   We can take advantage of static type checking within the functions by inlining the type hints.
   ### Does this PR introduce _any_ user-facing change?
   No
   ### How was this patch tested?
   Existing tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r754106603



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,
+    ):
         """
         Should not be called directly by users -- use :meth:`SparkContext.broadcast`
         instead.
         """
         if sc is not None:
             # we're on the driver.  We want the pickled data to end up in a file (maybe encrypted)
-            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)
+            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)  # type: ignore[attr-defined]
             self._path = f.name
-            self._sc = sc
-            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)
-            if sc._encryption_enabled:
+            self._sc: Optional["SparkContext"] = sc
+            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)  # type: ignore[attr-defined]
+            if sc._encryption_enabled:  # type: ignore[attr-defined]
                 # with encryption, we ask the jvm to do the encryption for us, we send it data
                 # over a socket
                 port, auth_secret = self._python_broadcast.setupEncryptionServer()
                 (encryption_sock_file, _) = local_connect_and_auth(port, auth_secret)
-                broadcast_out = ChunkedStream(encryption_sock_file, 8192)
+                broadcast_out: Union[ChunkedStream, IO[bytes]] = ChunkedStream(
+                    encryption_sock_file, 8192
+                )
             else:
                 # no encryption, we can just write pickled data directly to the file from python
                 broadcast_out = f
-            self.dump(value, broadcast_out)
-            if sc._encryption_enabled:
+            self.dump(cast(T, value), broadcast_out)

Review comment:
       @HyukjinKwon @itholic  @ueshin  @xinrong-databricks  WDYT? 
   
   Do we need a fancy overload on `__int__` here? Something around these lines
   
   ```python
       @overload  # On driver
       def __init__(self: Broadcast[T], sc: SparkContext, value: T pickle_registry: BroadcastPickleRegistry): ...
       @overload  # On worker without decryption server
       def __init__(self: Broadcast[Any], *, path: str): ...  # This is a placeholder for arbitrary value, so not Broadcast[None] 
       @overload  # On worker with  decryption server
       def __init__(self: Broadcast[Any], *, sock_file: str): ...  # Ditto
   ```
   
   `cast` definitely seems wrong, because we know that this thing can be `None` in this control flow (this is in contrast to many optional fields we access and we know, that under normal operating conditions, are not null). If anything, it should be ignored.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992086510


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50590/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992107037


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50590/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983249533


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r759804329



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,
+    ):
         """
         Should not be called directly by users -- use :meth:`SparkContext.broadcast`
         instead.
         """
         if sc is not None:
             # we're on the driver.  We want the pickled data to end up in a file (maybe encrypted)
-            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)
+            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)  # type: ignore[attr-defined]
             self._path = f.name
-            self._sc = sc
-            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)
-            if sc._encryption_enabled:
+            self._sc: Optional["SparkContext"] = sc
+            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)  # type: ignore[attr-defined]
+            if sc._encryption_enabled:  # type: ignore[attr-defined]
                 # with encryption, we ask the jvm to do the encryption for us, we send it data
                 # over a socket
                 port, auth_secret = self._python_broadcast.setupEncryptionServer()
                 (encryption_sock_file, _) = local_connect_and_auth(port, auth_secret)
-                broadcast_out = ChunkedStream(encryption_sock_file, 8192)
+                broadcast_out: Union[ChunkedStream, IO[bytes]] = ChunkedStream(
+                    encryption_sock_file, 8192
+                )
             else:
                 # no encryption, we can just write pickled data directly to the file from python
                 broadcast_out = f
-            self.dump(value, broadcast_out)
-            if sc._encryption_enabled:
+            self.dump(cast(T, value), broadcast_out)

Review comment:
       me too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r783469335



##########
File path: python/pyspark/broadcast.py
##########
@@ -175,27 +223,27 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], "Broadcast[T]"], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
+        cast("BroadcastPickleRegistry", self._pickle_registry).add(self)

Review comment:
       This should be:
   ```py
   assert self._pickle_registry is not None
   self._pickle_registry.add(self)
   ```
   ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992092768


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146115/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954661653


   **[Test build #144752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144752/testReport)** for PR 34439 at commit [`c7d3417`](https://github.com/apache/spark/commit/c7d3417293c5de7e2dd8378891c766489016f246).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954735990


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753856072



##########
File path: python/pyspark/broadcast.py
##########
@@ -175,27 +203,27 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], T], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
-        return _from_id, (self._jbroadcast.id(),)
+        cast(Any, self._pickle_registry).add(self)
+        return cast(Tuple[Callable[[int], T], Tuple[int]], (_from_id, (self._jbroadcast.id(),)))
 
 
 class BroadcastPickleRegistry(threading.local):
     """Thread-local registry for broadcast variables that have been pickled"""
 
-    def __init__(self):
+    def __init__(self) -> None:
         self.__dict__.setdefault("_registry", set())
 
-    def __iter__(self):
+    def __iter__(self) -> Iterator[Broadcast]:
         for bcast in self._registry:
             yield bcast
 
-    def add(self, bcast):
+    def add(self, bcast: Any) -> None:

Review comment:
       `bcast: Broadcast`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983224101


   **[Test build #145781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145781/testReport)** for PR 34439 at commit [`1308016`](https://github.com/apache/spark/commit/13080168236f3084cb2250478215acd525b85701).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r759809276



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,
+    ):
         """
         Should not be called directly by users -- use :meth:`SparkContext.broadcast`
         instead.
         """
         if sc is not None:
             # we're on the driver.  We want the pickled data to end up in a file (maybe encrypted)
-            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)
+            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)  # type: ignore[attr-defined]
             self._path = f.name
-            self._sc = sc
-            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)
-            if sc._encryption_enabled:
+            self._sc: Optional["SparkContext"] = sc
+            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)  # type: ignore[attr-defined]
+            if sc._encryption_enabled:  # type: ignore[attr-defined]
                 # with encryption, we ask the jvm to do the encryption for us, we send it data
                 # over a socket
                 port, auth_secret = self._python_broadcast.setupEncryptionServer()
                 (encryption_sock_file, _) = local_connect_and_auth(port, auth_secret)
-                broadcast_out = ChunkedStream(encryption_sock_file, 8192)
+                broadcast_out: Union[ChunkedStream, IO[bytes]] = ChunkedStream(
+                    encryption_sock_file, 8192
+                )
             else:
                 # no encryption, we can just write pickled data directly to the file from python
                 broadcast_out = f
-            self.dump(value, broadcast_out)
-            if sc._encryption_enabled:
+            self.dump(cast(T, value), broadcast_out)

Review comment:
       updated overload on `__init__` and change `cast` to `ignore`. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966060536


   **[Test build #145092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145092/testReport)** for PR 34439 at commit [`1b89310`](https://github.com/apache/spark/commit/1b893106f58b48d4240f94d8866620c0d5fc53e4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class Broadcast(Generic[T]):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983237879


   **[Test build #145781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145781/testReport)** for PR 34439 at commit [`1308016`](https://github.com/apache/spark/commit/13080168236f3084cb2250478215acd525b85701).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983239937


   **[Test build #145782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145782/testReport)** for PR 34439 at commit [`cc65b8c`](https://github.com/apache/spark/commit/cc65b8c26c9e33ada63ade7218d2ea869010d0a6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983276401


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992092768


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146115/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954735990


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954695002


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144752/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971142238


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971139468


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49772/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r781536111



##########
File path: python/pyspark/context.py
##########
@@ -1183,7 +1183,7 @@ def union(self, rdds: List["RDD[T]"]) -> "RDD[T]":
             jrdds[i] = rdds[i]._jrdd  # type: ignore[attr-defined]
         return RDD(self._jsc.union(jrdds), self, rdds[0]._jrdd_deserializer)  # type: ignore[attr-defined]
 
-    def broadcast(self, value: T) -> "Broadcast[T]":
+    def broadcast(self, value: T) -> "Broadcast":

Review comment:
       After `Broadcast` is made `Generic` again this should be `Broadcast[T]`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-1011944338


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966172142


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954708644


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753855676



##########
File path: python/pyspark/broadcast.py
##########
@@ -175,27 +203,27 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], T], Tuple[int]]:

Review comment:
       This signature seems to be wrong. Double checking the flow:
   
   -  `_from_id` returns `_broadcastRegistry[bid]`
   - `_broadcastRegistry` is  `_broadcastRegistry: Dict[int, "Broadcast[Any]"]`
   -  So `_from_id` is either `Callable[[int],  Broadcast[T]]`, or if it doesn't type check, `Callable[[int], Broadcast[Any]]`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983276401


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992069186


   **[Test build #146115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146115/testReport)** for PR 34439 at commit [`6905265`](https://github.com/apache/spark/commit/6905265c2374e15479c241752eb19a24ac5e3589).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954678753


   **[Test build #144752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144752/testReport)** for PR 34439 at commit [`c7d3417`](https://github.com/apache/spark/commit/c7d3417293c5de7e2dd8378891c766489016f246).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class Broadcast(Generic[T]):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-957181189


   CC @HyukjinKwon @zero323 @ueshin too. Many thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966075665


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145092/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966165831


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966045479


   **[Test build #145092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145092/testReport)** for PR 34439 at commit [`1b89310`](https://github.com/apache/spark/commit/1b893106f58b48d4240f94d8866620c0d5fc53e4).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r754040175



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,
+    ):
         """
         Should not be called directly by users -- use :meth:`SparkContext.broadcast`
         instead.
         """
         if sc is not None:
             # we're on the driver.  We want the pickled data to end up in a file (maybe encrypted)
-            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)
+            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)  # type: ignore[attr-defined]
             self._path = f.name
-            self._sc = sc
-            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)
-            if sc._encryption_enabled:
+            self._sc: Optional["SparkContext"] = sc
+            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)  # type: ignore[attr-defined]
+            if sc._encryption_enabled:  # type: ignore[attr-defined]
                 # with encryption, we ask the jvm to do the encryption for us, we send it data
                 # over a socket
                 port, auth_secret = self._python_broadcast.setupEncryptionServer()
                 (encryption_sock_file, _) = local_connect_and_auth(port, auth_secret)
-                broadcast_out = ChunkedStream(encryption_sock_file, 8192)
+                broadcast_out: Union[ChunkedStream, IO[bytes]] = ChunkedStream(
+                    encryption_sock_file, 8192
+                )
             else:
                 # no encryption, we can just write pickled data directly to the file from python
                 broadcast_out = f
-            self.dump(value, broadcast_out)
-            if sc._encryption_enabled:
+            self.dump(cast(T, value), broadcast_out)

Review comment:
       yes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753858723



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,

Review comment:
       `Optional[str]`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975400484


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983276375


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992069186


   **[Test build #146115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146115/testReport)** for PR 34439 at commit [`6905265`](https://github.com/apache/spark/commit/6905265c2374e15479c241752eb19a24ac5e3589).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992116255


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50590/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-1007619850


   Could you please resolve the conflicts @dchvn?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753859416



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,
+    ):
         """
         Should not be called directly by users -- use :meth:`SparkContext.broadcast`
         instead.
         """
         if sc is not None:
             # we're on the driver.  We want the pickled data to end up in a file (maybe encrypted)
-            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)
+            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)  # type: ignore[attr-defined]
             self._path = f.name
-            self._sc = sc
-            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)
-            if sc._encryption_enabled:
+            self._sc: Optional["SparkContext"] = sc
+            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)  # type: ignore[attr-defined]
+            if sc._encryption_enabled:  # type: ignore[attr-defined]
                 # with encryption, we ask the jvm to do the encryption for us, we send it data
                 # over a socket
                 port, auth_secret = self._python_broadcast.setupEncryptionServer()
                 (encryption_sock_file, _) = local_connect_and_auth(port, auth_secret)
-                broadcast_out = ChunkedStream(encryption_sock_file, 8192)
+                broadcast_out: Union[ChunkedStream, IO[bytes]] = ChunkedStream(
+                    encryption_sock_file, 8192
+                )
             else:
                 # no encryption, we can just write pickled data directly to the file from python
                 broadcast_out = f
-            self.dump(value, broadcast_out)
-            if sc._encryption_enabled:
+            self.dump(cast(T, value), broadcast_out)

Review comment:
       Why do we need `cast` here? Is this because of `Optional`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753859047



##########
File path: python/pyspark/broadcast.py
##########
@@ -113,11 +141,11 @@ def dump(self, value, f):
             raise pickle.PicklingError(msg)
         f.close()
 
-    def load_from_path(self, path):
+    def load_from_path(self, path: Any) -> T:
         with open(path, "rb", 1 << 20) as f:
             return self.load(f)
 
-    def load(self, file):
+    def load(self, file: Any) -> T:

Review comment:
       Same as https://github.com/apache/spark/pull/34439/files#r753858983?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753858864



##########
File path: python/pyspark/broadcast.py
##########
@@ -113,11 +141,11 @@ def dump(self, value, f):
             raise pickle.PicklingError(msg)
         f.close()
 
-    def load_from_path(self, path):
+    def load_from_path(self, path: Any) -> T:

Review comment:
       `path: str`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975400554


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-1008560825


   ping @zero323 :smile:  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983275972


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971118816


   **[Test build #145302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145302/testReport)** for PR 34439 at commit [`e304c6c`](https://github.com/apache/spark/commit/e304c6cd391a4f17d8410fc00de90d003d243552).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971118816


   **[Test build #145302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145302/testReport)** for PR 34439 at commit [`e304c6c`](https://github.com/apache/spark/commit/e304c6cd391a4f17d8410fc00de90d003d243552).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971142238


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-957181189


   CC @HyukjinKwon @zero323 @ueshin too. Many thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954661653


   **[Test build #144752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144752/testReport)** for PR 34439 at commit [`c7d3417`](https://github.com/apache/spark/commit/c7d3417293c5de7e2dd8378891c766489016f246).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992081510


   **[Test build #146115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146115/testReport)** for PR 34439 at commit [`6905265`](https://github.com/apache/spark/commit/6905265c2374e15479c241752eb19a24ac5e3589).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992116255


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50590/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753850759



##########
File path: python/pyspark/broadcast.py
##########
@@ -113,11 +141,11 @@ def dump(self, value, f):
             raise pickle.PicklingError(msg)
         f.close()
 
-    def load_from_path(self, path):
+    def load_from_path(self, path: Any) -> T:
         with open(path, "rb", 1 << 20) as f:
             return self.load(f)
 
-    def load(self, file):
+    def load(self, file: Any) -> T:

Review comment:
       `path: str`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975290186


   **[Test build #145507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145507/testReport)** for PR 34439 at commit [`c1cb255`](https://github.com/apache/spark/commit/c1cb255a87f219e28315598c8820f4b1c9cdd765).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r754106603



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,
+    ):
         """
         Should not be called directly by users -- use :meth:`SparkContext.broadcast`
         instead.
         """
         if sc is not None:
             # we're on the driver.  We want the pickled data to end up in a file (maybe encrypted)
-            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)
+            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)  # type: ignore[attr-defined]
             self._path = f.name
-            self._sc = sc
-            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)
-            if sc._encryption_enabled:
+            self._sc: Optional["SparkContext"] = sc
+            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)  # type: ignore[attr-defined]
+            if sc._encryption_enabled:  # type: ignore[attr-defined]
                 # with encryption, we ask the jvm to do the encryption for us, we send it data
                 # over a socket
                 port, auth_secret = self._python_broadcast.setupEncryptionServer()
                 (encryption_sock_file, _) = local_connect_and_auth(port, auth_secret)
-                broadcast_out = ChunkedStream(encryption_sock_file, 8192)
+                broadcast_out: Union[ChunkedStream, IO[bytes]] = ChunkedStream(
+                    encryption_sock_file, 8192
+                )
             else:
                 # no encryption, we can just write pickled data directly to the file from python
                 broadcast_out = f
-            self.dump(value, broadcast_out)
-            if sc._encryption_enabled:
+            self.dump(cast(T, value), broadcast_out)

Review comment:
       @HyukjinKwon @itholic  @ueshin  @xinrong-databricks  WDYT? 
   
   Do we need a fancy overload on `__int__` here? Something around these lines
   
   ```python
       @overload  # On driver
       def __init__(self: Broadcast[T], sc: SparkContext, value: T pickle_registry: BroadcastPickleRegistry): ...
       @overload  # On worker without decryption server
       def __init__(self: Broadcast[Any], *, path: str): ...       # This is a placeholder for arbitrary value, so not Broadcast[None] 
       @overload  # On worker with  decryption server
       def __init__(self: Broadcast[Any], *, sock_file: str): ...  # Ditto
   ```
   
   `cast` definitely seems wrong, because we know that this thing can be `None` in this control flow (this is in contrast to many optional fields we access and we know, that under normal operating conditions, are not null). If anything, it should be ignored.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r781535651



##########
File path: examples/src/main/python/als.py
##########
@@ -94,8 +94,8 @@ def update(i, mat, ratings):
         msb = sc.broadcast(ms)
 
         us_ = sc.parallelize(range(U), partitions) \
-            .map(lambda x: update(x, msb.value, Rb.value.T)) \
-            .collect()
+            .map(lambda x: update(x, msb.value, Rb.value.T)).collect()  # type: ignore[attr-defined]

Review comment:
       Once you bring back `Generic` this shouldn't be necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753859207



##########
File path: python/pyspark/broadcast.py
##########
@@ -21,28 +21,47 @@
 from tempfile import NamedTemporaryFile
 import threading
 import pickle
+from typing import (
+    cast,
+    Any,
+    Callable,
+    Dict,
+    Generic,
+    IO,
+    Iterator,
+    Optional,
+    Tuple,
+    TypeVar,
+    TYPE_CHECKING,
+    Union,
+)
 
 from pyspark.java_gateway import local_connect_and_auth
 from pyspark.serializers import ChunkedStream, pickle_protocol
 from pyspark.util import print_exec
 
+if TYPE_CHECKING:
+    from pyspark import SparkContext
+
 
 __all__ = ["Broadcast"]
 
+T = TypeVar("T")
+
 
 # Holds broadcasted data received from Java, keyed by its id.
-_broadcastRegistry = {}
+_broadcastRegistry: Dict[int, "Broadcast"] = {}

Review comment:
       Let's avoid implicit `Any` ‒ `Dict[int, "Broadcast[Any]"]`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753859161



##########
File path: python/pyspark/broadcast.py
##########
@@ -21,28 +21,47 @@
 from tempfile import NamedTemporaryFile
 import threading
 import pickle
+from typing import (
+    cast,
+    Any,
+    Callable,
+    Dict,
+    Generic,
+    IO,
+    Iterator,
+    Optional,
+    Tuple,
+    TypeVar,
+    TYPE_CHECKING,
+    Union,
+)
 
 from pyspark.java_gateway import local_connect_and_auth
 from pyspark.serializers import ChunkedStream, pickle_protocol
 from pyspark.util import print_exec
 
+if TYPE_CHECKING:
+    from pyspark import SparkContext
+
 
 __all__ = ["Broadcast"]
 
+T = TypeVar("T")
+
 
 # Holds broadcasted data received from Java, keyed by its id.
-_broadcastRegistry = {}
+_broadcastRegistry: Dict[int, "Broadcast"] = {}
 
 
-def _from_id(bid):
+def _from_id(bid: int) -> "Broadcast":

Review comment:
       `-> "Broadcast[Any]"`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753855694



##########
File path: python/pyspark/broadcast.py
##########
@@ -177,28 +201,28 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], T], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
-        return _from_id, (self._jbroadcast.id(),)
+        cast(Any, self._pickle_registry).add(self)
+        return cast(Tuple[Callable[[int], T], Tuple[int]], (_from_id, (self._jbroadcast.id(),)))

Review comment:
       https://github.com/apache/spark/pull/34439/files#r753855676




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975347637


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983246898


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r759746730



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,
+    ):
         """
         Should not be called directly by users -- use :meth:`SparkContext.broadcast`
         instead.
         """
         if sc is not None:
             # we're on the driver.  We want the pickled data to end up in a file (maybe encrypted)
-            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)
+            f = NamedTemporaryFile(delete=False, dir=sc._temp_dir)  # type: ignore[attr-defined]
             self._path = f.name
-            self._sc = sc
-            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)
-            if sc._encryption_enabled:
+            self._sc: Optional["SparkContext"] = sc
+            self._python_broadcast = sc._jvm.PythonRDD.setupBroadcast(self._path)  # type: ignore[attr-defined]
+            if sc._encryption_enabled:  # type: ignore[attr-defined]
                 # with encryption, we ask the jvm to do the encryption for us, we send it data
                 # over a socket
                 port, auth_secret = self._python_broadcast.setupEncryptionServer()
                 (encryption_sock_file, _) = local_connect_and_auth(port, auth_secret)
-                broadcast_out = ChunkedStream(encryption_sock_file, 8192)
+                broadcast_out: Union[ChunkedStream, IO[bytes]] = ChunkedStream(
+                    encryption_sock_file, 8192
+                )
             else:
                 # no encryption, we can just write pickled data directly to the file from python
                 broadcast_out = f
-            self.dump(value, broadcast_out)
-            if sc._encryption_enabled:
+            self.dump(cast(T, value), broadcast_out)

Review comment:
       I'm fine with adding the overloads.

##########
File path: python/pyspark/broadcast.py
##########
@@ -175,27 +205,27 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], "Broadcast[T]"], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
+        cast(Any, self._pickle_registry).add(self)
         return _from_id, (self._jbroadcast.id(),)
 
 
 class BroadcastPickleRegistry(threading.local):
     """Thread-local registry for broadcast variables that have been pickled"""
 
-    def __init__(self):
+    def __init__(self) -> None:
         self.__dict__.setdefault("_registry", set())
 
-    def __iter__(self):
+    def __iter__(self) -> Iterator[Broadcast]:

Review comment:
       `Iterator[Broadcast[Any]]`?

##########
File path: python/pyspark/broadcast.py
##########
@@ -175,27 +205,27 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], "Broadcast[T]"], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
+        cast(Any, self._pickle_registry).add(self)

Review comment:
       `BroadcastPickleRegistry` instead of `Any`?

##########
File path: python/pyspark/broadcast.py
##########
@@ -175,27 +205,27 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], "Broadcast[T]"], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
+        cast(Any, self._pickle_registry).add(self)
         return _from_id, (self._jbroadcast.id(),)
 
 
 class BroadcastPickleRegistry(threading.local):
     """Thread-local registry for broadcast variables that have been pickled"""
 
-    def __init__(self):
+    def __init__(self) -> None:
         self.__dict__.setdefault("_registry", set())
 
-    def __iter__(self):
+    def __iter__(self) -> Iterator[Broadcast]:
         for bcast in self._registry:
             yield bcast
 
-    def add(self, bcast):
+    def add(self, bcast: Broadcast) -> None:

Review comment:
       `Broadcast[Any]`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983250327






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971131891


   **[Test build #145302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145302/testReport)** for PR 34439 at commit [`e304c6c`](https://github.com/apache/spark/commit/e304c6cd391a4f17d8410fc00de90d003d243552).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971172257


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49772/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966045479


   **[Test build #145092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145092/testReport)** for PR 34439 at commit [`1b89310`](https://github.com/apache/spark/commit/1b893106f58b48d4240f94d8866620c0d5fc53e4).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966106285


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r781534745



##########
File path: python/pyspark/broadcast.py
##########
@@ -53,18 +51,18 @@
 
 
 # Holds broadcasted data received from Java, keyed by its id.
-_broadcastRegistry: Dict[int, "Broadcast[Any]"] = {}
+_broadcastRegistry: Dict[int, "Broadcast"] = {}
 
 
-def _from_id(bid: int) -> "Broadcast[Any]":
+def _from_id(bid: int) -> "Broadcast":
     from pyspark.broadcast import _broadcastRegistry
 
     if bid not in _broadcastRegistry:
         raise RuntimeError("Broadcast variable '%s' not loaded!" % bid)
     return _broadcastRegistry[bid]
 
 
-class Broadcast(Generic[T]):

Review comment:
       Oh, we cannot do that...  `Generic[T]` has to go back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34439:
URL: https://github.com/apache/spark/pull/34439


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-1009670737


   Sorry about my mistakes, @zero323 , Can you review this PR again? Many thanks :smiling_face_with_tear: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r783555825



##########
File path: python/pyspark/broadcast.py
##########
@@ -175,27 +223,27 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], "Broadcast[T]"], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
+        cast("BroadcastPickleRegistry", self._pickle_registry).add(self)

Review comment:
       Updated, thanks! :smile: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966172142


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-966075665


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145092/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753850759



##########
File path: python/pyspark/broadcast.py
##########
@@ -113,11 +141,11 @@ def dump(self, value, f):
             raise pickle.PicklingError(msg)
         f.close()
 
-    def load_from_path(self, path):
+    def load_from_path(self, path: Any) -> T:
         with open(path, "rb", 1 << 20) as f:
             return self.load(f)
 
-    def load(self, file):
+    def load(self, file: Any) -> T:

Review comment:
       `path: str`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753851766



##########
File path: python/pyspark/broadcast.py
##########
@@ -102,7 +130,7 @@ def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_fi
                 assert path is not None
                 self._path = path
 
-    def dump(self, value, f):
+    def dump(self, value: T, f: Any) -> None:

Review comment:
       `f: BinaryIO`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975342598


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145507/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975342598


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145507/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975400554


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971114804


   @ueshin Thanks for your reviewing! I updated this PR follow your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-1007619850


   Could you please resolve the conflicts @dchvn?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983272567


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r745239197



##########
File path: python/pyspark/broadcast.py
##########
@@ -177,28 +201,28 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], T], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
-        return _from_id, (self._jbroadcast.id(),)
+        cast(Any, self._pickle_registry).add(self)
+        return cast(Tuple[Callable[[int], T], Tuple[int]], (_from_id, (self._jbroadcast.id(),)))

Review comment:
       I use `cast` to match with type hint of this functions

##########
File path: python/pyspark/broadcast.py
##########
@@ -177,28 +201,28 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], T], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
-        return _from_id, (self._jbroadcast.id(),)
+        cast(Any, self._pickle_registry).add(self)
+        return cast(Tuple[Callable[[int], T], Tuple[int]], (_from_id, (self._jbroadcast.id(),)))
 
 
 class BroadcastPickleRegistry(threading.local):
     """ Thread-local registry for broadcast variables that have been pickled
     """
 
-    def __init__(self):
+    def __init__(self) -> None:
         self.__dict__.setdefault("_registry", set())
 
-    def __iter__(self):
+    def __iter__(self) -> Generator[Broadcast, None, None]:

Review comment:
       That differs from `broadcast.pyi` because we receive a `Generator` with `yield` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954695002


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144752/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-954684975


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-957181189


   CC @HyukjinKwon @zero323 @ueshin too. Many thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r750715677



##########
File path: python/pyspark/broadcast.py
##########
@@ -21,28 +21,43 @@
 from tempfile import NamedTemporaryFile
 import threading
 import pickle
+from typing import (
+    cast,
+    Any,
+    Callable,
+    Dict,
+    Generator,
+    Generic,
+    IO,
+    Optional,
+    Tuple,
+    TypeVar,
+    Union,
+)
 
 from pyspark.java_gateway import local_connect_and_auth
 from pyspark.serializers import ChunkedStream, pickle_protocol
-from pyspark.util import print_exec
+from pyspark.util import print_exec  # type: ignore[attr-defined]

Review comment:
       I think we don't need this change anymore.

##########
File path: python/pyspark/broadcast.py
##########
@@ -62,7 +77,14 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional[Any] = None,

Review comment:
       I guess `Optional[SparkContext]`?

##########
File path: python/pyspark/broadcast.py
##########
@@ -62,7 +77,14 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional[Any] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional[Any] = None,

Review comment:
       I guess `Optional[BroadcastPickleRegistry]`?

##########
File path: python/pyspark/broadcast.py
##########
@@ -177,28 +201,28 @@ def destroy(self, blocking=False):
         self._jbroadcast.destroy(blocking)
         os.unlink(self._path)
 
-    def __reduce__(self):
+    def __reduce__(self) -> Tuple[Callable[[int], T], Tuple[int]]:
         if self._jbroadcast is None:
             raise RuntimeError("Broadcast can only be serialized in driver")
-        self._pickle_registry.add(self)
-        return _from_id, (self._jbroadcast.id(),)
+        cast(Any, self._pickle_registry).add(self)
+        return cast(Tuple[Callable[[int], T], Tuple[int]], (_from_id, (self._jbroadcast.id(),)))
 
 
 class BroadcastPickleRegistry(threading.local):
     """ Thread-local registry for broadcast variables that have been pickled
     """
 
-    def __init__(self):
+    def __init__(self) -> None:
         self.__dict__.setdefault("_registry", set())
 
-    def __iter__(self):
+    def __iter__(self) -> Generator[Broadcast, None, None]:

Review comment:
       I guess we can use `Iterator` instead of `Generator`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971161510


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49772/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975290186


   **[Test build #145507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145507/testReport)** for PR 34439 at commit [`c1cb255`](https://github.com/apache/spark/commit/c1cb255a87f219e28315598c8820f4b1c9cdd765).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r753858983



##########
File path: python/pyspark/broadcast.py
##########
@@ -62,35 +81,44 @@ class Broadcast(object):
     >>> large_broadcast = sc.broadcast(range(10000))
     """
 
-    def __init__(self, sc=None, value=None, pickle_registry=None, path=None, sock_file=None):
+    def __init__(
+        self,
+        sc: Optional["SparkContext"] = None,
+        value: Optional[T] = None,
+        pickle_registry: Optional["BroadcastPickleRegistry"] = None,
+        path: Optional[Any] = None,
+        sock_file: Optional[Any] = None,

Review comment:
       Probably `Optional[BinaryIO]`, but more eyes on this would be good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-975320529


   **[Test build #145507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145507/testReport)** for PR 34439 at commit [`c1cb255`](https://github.com/apache/spark/commit/c1cb255a87f219e28315598c8820f4b1c9cdd765).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class Broadcast(Generic[T]):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-971172257


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49772/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983224101


   **[Test build #145781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145781/testReport)** for PR 34439 at commit [`1308016`](https://github.com/apache/spark/commit/13080168236f3084cb2250478215acd525b85701).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983227579


   **[Test build #145782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145782/testReport)** for PR 34439 at commit [`cc65b8c`](https://github.com/apache/spark/commit/cc65b8c26c9e33ada63ade7218d2ea869010d0a6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983227579


   **[Test build #145782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145782/testReport)** for PR 34439 at commit [`cc65b8c`](https://github.com/apache/spark/commit/cc65b8c26c9e33ada63ade7218d2ea869010d0a6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983250327






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-983275972


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-992061230


   cc @ueshin  @zero323 ! Please take a look if you have time! Thanks :smiley: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34439:
URL: https://github.com/apache/spark/pull/34439#discussion_r781534293



##########
File path: python/pyspark/broadcast.py
##########
@@ -53,18 +51,18 @@
 
 
 # Holds broadcasted data received from Java, keyed by its id.
-_broadcastRegistry: Dict[int, "Broadcast[Any]"] = {}
+_broadcastRegistry: Dict[int, "Broadcast"] = {}
 
 
-def _from_id(bid: int) -> "Broadcast[Any]":
+def _from_id(bid: int) -> "Broadcast":
     from pyspark.broadcast import _broadcastRegistry
 
     if bid not in _broadcastRegistry:
         raise RuntimeError("Broadcast variable '%s' not loaded!" % bid)
     return _broadcastRegistry[bid]
 
 
-class Broadcast(Generic[T]):

Review comment:
       Oh, wecannot do that. `Generic[T]` has to go back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dchvn commented on pull request #34439: [SPARK-37095][PYTHON] Inline type hints for files in python/pyspark/broadcast.py

Posted by GitBox <gi...@apache.org>.
dchvn commented on pull request #34439:
URL: https://github.com/apache/spark/pull/34439#issuecomment-1011947810


   Thanks all! :smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org