You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/08 21:07:59 UTC

[GitHub] [spark] ueshin opened a new pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

ueshin opened a new pull request #34225:
URL: https://github.com/apache/spark/pull/34225


   ### What changes were proposed in this pull request?
   
   Inline type hints from `python/pyspark/sql/dataframe.pyi` to `python/pyspark/sql/dataframe.py`.
   
   ### Why are the changes needed?
   
   Currently, there is type hint stub files `python/pyspark/sql/dataframe.pyi` to show the expected types for functions, but we can also take advantage of static type checking within the functions by inlining the type hints.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725459921



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
                 raise TypeError("value argument is required when to_replace is not a dictionary.")
 
         # Helper functions
-        def all_of(types):
+        def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:

Review comment:
       This is OK, as long as we don't need `[]`. In fact, we'll have to go the other way in the future as `typing` variants having `collections.abc` counterpart aredeprecated in 3.9 ([PEP 585](https://www.python.org/dev/peps/pep-0585/)) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940370666


   **[Test build #144098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144098/testReport)** for PR 34225 at commit [`5d15bf5`](https://github.com/apache/spark/commit/5d15bf5f5468e320b1c57d7345ce227d693b2080).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940404634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144098/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725380604



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
         else:
             raise TypeError("numPartitions should be an int or Column")
 
-    def repartitionByRange(self, numPartitions, *cols):
+    @overload
+    def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    @overload
+    def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    def repartitionByRange(  # type: ignore[misc]
+        self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"

Review comment:
       I am curious if `numPartitions` being `ColumnOrName` will be showed externally




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725383545



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
                 raise TypeError("value argument is required when to_replace is not a dictionary.")
 
         # Helper functions
-        def all_of(types):
+        def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:

Review comment:
       Should this be `typing.Iterable`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940401435


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48576/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939136345


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940370666


   **[Test build #144098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144098/testReport)** for PR 34225 at commit [`5d15bf5`](https://github.com/apache/spark/commit/5d15bf5f5468e320b1c57d7345ce227d693b2080).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939156225


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48514/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940404634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144098/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725458949



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
         else:
             raise TypeError("numPartitions should be an int or Column")
 
-    def repartitionByRange(self, numPartitions, *cols):
+    @overload
+    def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    @overload
+    def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    def repartitionByRange(  # type: ignore[misc]
+        self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"

Review comment:
       But we don't really support `numPartitions` as a `Column`, do we?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939117368


   **[Test build #144037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144037/testReport)** for PR 34225 at commit [`f366fc8`](https://github.com/apache/spark/commit/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726602528



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
                 raise TypeError("value argument is required when to_replace is not a dictionary.")
 
         # Helper functions
-        def all_of(types):
+        def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:

Review comment:
       That's good to know, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725459974



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
                 raise TypeError("value argument is required when to_replace is not a dictionary.")
 
         # Helper functions
-        def all_of(types):
+        def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:

Review comment:
       This is OK, as long as we don't need `[]`. In fact, we'll have to go the other way in the future as `typing` variants having `collections.abc` counterparts are deprecated in 3.9 ([PEP 585](https://www.python.org/dev/peps/pep-0585/)) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725408880



##########
File path: python/pyspark/sql/__init__.pyi
##########
@@ -29,7 +29,7 @@ from pyspark.sql.dataframe import (  # noqa: F401
     DataFrameStatFunctions as DataFrameStatFunctions,
 )
 from pyspark.sql.group import GroupedData as GroupedData  # noqa: F401
-from pyspark.sql.observation import Observation  # noqa: F401
+from pyspark.sql.observation import Observation as Observation # noqa: F401

Review comment:
       If it is in `__all__` of `__init__.py` then it should be here, since we import directly from `pyspark.sql`:
   
   https://github.com/apache/spark/blob/07ecbc4049aa7f8daa11e6a924c37c1db2f53c73/python/pyspark/sql/dataframe.py#L1994
   
   But it won't be necessary once #34203 is merged.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939136345


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940544565


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725457655



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
         else:
             raise TypeError("numPartitions should be an int or Column")
 
-    def repartitionByRange(self, numPartitions, *cols):
+    @overload
+    def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    @overload
+    def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    def repartitionByRange(  # type: ignore[misc]
+        self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"

Review comment:
       It is not: 
   
   ```python
   # test.py
   from pyspark.sql import DataFrame
   
   reveal_type(DataFrame.repartitionByRange)
   ```
   
   ```
   test.py:4: note: Revealed type is "Overload(def (self: pyspark.sql.dataframe.DataFrame, numPartitions: builtins.int, *cols: Union[pyspark.sql.column.Column, builtins.str]) -> pyspark.sql.dataframe.DataFrame, def (self: pyspark.sql.dataframe.DataFrame, *cols: Union[pyspark.sql.column.Column, builtins.str]) -> pyspark.sql.dataframe.DataFrame)"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939117368


   **[Test build #144037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144037/testReport)** for PR 34225 at commit [`f366fc8`](https://github.com/apache/spark/commit/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939136361


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48514/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

HyukjinKwon closed pull request #34225:
URL: https://github.com/apache/spark/pull/34225


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725374295



##########
File path: python/pyspark/sql/__init__.pyi
##########
@@ -29,7 +29,7 @@ from pyspark.sql.dataframe import (  # noqa: F401
     DataFrameStatFunctions as DataFrameStatFunctions,
 )
 from pyspark.sql.group import GroupedData as GroupedData  # noqa: F401
-from pyspark.sql.observation import Observation  # noqa: F401
+from pyspark.sql.observation import Observation as Observation # noqa: F401

Review comment:
       What would happen if we don't add `as Observation`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725369409



##########
File path: python/pyspark/__init__.pyi
##########
@@ -71,7 +71,7 @@ def since(version: Union[str, float]) -> Callable[[T], T]: ...
 def copy_func(
     f: F,
     name: Optional[str] = ...,
-    sinceversion: Optional[str] = ...,
+    sinceversion: Optional[Union[str, float]] = ...,

Review comment:
       Great! My ongoing PR needs the change as well! 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725381412



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -1017,7 +1082,7 @@ def repartitionByRange(self, numPartitions, *cols):
         """
         if isinstance(numPartitions, int):
             if len(cols) == 0:
-                return ValueError("At least one partition-by expression must be specified.")
+                raise ValueError("At least one partition-by expression must be specified.")

Review comment:
       Good catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939128848


   **[Test build #144037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144037/testReport)** for PR 34225 at commit [`f366fc8`](https://github.com/apache/spark/commit/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726491351



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
         else:
             raise TypeError("numPartitions should be an int or Column")
 
-    def repartitionByRange(self, numPartitions, *cols):
+    @overload
+    def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    @overload
+    def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    def repartitionByRange(  # type: ignore[misc]
+        self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"

Review comment:
       In runtime, when we call something like `sdf.repartitionByRange('col1', 'col2', ...)`, the `numPartitions` will be `ColumnOrName`.
   In fact, it checks whether it's `str` or `Column`:
   https://github.com/apache/spark/blob/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122/python/pyspark/sql/dataframe.py#L1089-L1091
   
   Even though this won't be showed externally, we need it to make `mypy` check the function body properly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940388116


   **[Test build #144098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144098/testReport)** for PR 34225 at commit [`5d15bf5`](https://github.com/apache/spark/commit/5d15bf5f5468e320b1c57d7345ce227d693b2080).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class Database(NamedTuple):`
     * `class Table(NamedTuple):`
     * `class Column(NamedTuple):`
     * `class Function(NamedTuple):`
     * `  protected class YarnSchedulerEndpoint(override val rpcEnv: RpcEnv)`
     * `class IndexAlreadyExistsException(message: String, cause: Option[Throwable] = None)`
     * `case class SetCatalogAndNamespace(child: LogicalPlan) extends UnaryCommand `
     * `case class SetNamespaceCommand(namespace: Seq[String]) extends LeafRunnableCommand `
     * `case class HashedRelationBroadcastMode(key: Seq[Expression], isNullAware: Boolean = false)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726591383



##########
File path: python/pyspark/sql/__init__.pyi
##########
@@ -29,7 +29,7 @@ from pyspark.sql.dataframe import (  # noqa: F401
     DataFrameStatFunctions as DataFrameStatFunctions,
 )
 from pyspark.sql.group import GroupedData as GroupedData  # noqa: F401
-from pyspark.sql.observation import Observation  # noqa: F401
+from pyspark.sql.observation import Observation as Observation # noqa: F401

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940439797


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48576/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940439797


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48576/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940428416


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48576/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939155327


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48514/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939156225


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48514/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Posted by GitBox <gi...@apache.org>.

ueshin commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939116934


   cc @zero323 @xinrong-databricks @HyukjinKwon @itholic


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org