You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/08 21:07:59 UTC
[GitHub] [spark] ueshin opened a new pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
ueshin opened a new pull request #34225:
URL: https://github.com/apache/spark/pull/34225
### What changes were proposed in this pull request?
Inline type hints from `python/pyspark/sql/dataframe.pyi` to `python/pyspark/sql/dataframe.py`.
### Why are the changes needed?
Currently, there is type hint stub files `python/pyspark/sql/dataframe.pyi` to show the expected types for functions, but we can also take advantage of static type checking within the functions by inlining the type hints.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725459921
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
raise TypeError("value argument is required when to_replace is not a dictionary.")
# Helper functions
- def all_of(types):
+ def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:
Review comment:
This is OK, as long as we don't need `[]`. In fact, we'll have to go the other way in the future as `typing` variants having `collections.abc` counterpart aredeprecated in 3.9 ([PEP 585](https://www.python.org/dev/peps/pep-0585/))
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940370666
**[Test build #144098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144098/testReport)** for PR 34225 at commit [`5d15bf5`](https://github.com/apache/spark/commit/5d15bf5f5468e320b1c57d7345ce227d693b2080).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940404634
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144098/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725380604
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
else:
raise TypeError("numPartitions should be an int or Column")
- def repartitionByRange(self, numPartitions, *cols):
+ @overload
+ def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ @overload
+ def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ def repartitionByRange( # type: ignore[misc]
+ self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"
Review comment:
I am curious if `numPartitions` being `ColumnOrName` will be showed externally
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725383545
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
raise TypeError("value argument is required when to_replace is not a dictionary.")
# Helper functions
- def all_of(types):
+ def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:
Review comment:
Should this be `typing.Iterable`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940401435
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48576/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939136345
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144037/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940370666
**[Test build #144098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144098/testReport)** for PR 34225 at commit [`5d15bf5`](https://github.com/apache/spark/commit/5d15bf5f5468e320b1c57d7345ce227d693b2080).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939156225
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48514/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940404634
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144098/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725458949
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
else:
raise TypeError("numPartitions should be an int or Column")
- def repartitionByRange(self, numPartitions, *cols):
+ @overload
+ def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ @overload
+ def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ def repartitionByRange( # type: ignore[misc]
+ self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"
Review comment:
But we don't really support `numPartitions` as a `Column`, do we?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939117368
**[Test build #144037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144037/testReport)** for PR 34225 at commit [`f366fc8`](https://github.com/apache/spark/commit/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726602528
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
raise TypeError("value argument is required when to_replace is not a dictionary.")
# Helper functions
- def all_of(types):
+ def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:
Review comment:
That's good to know, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725459974
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2348,7 +2569,7 @@ def replace(self, to_replace, value=_NoValue, subset=None):
raise TypeError("value argument is required when to_replace is not a dictionary.")
# Helper functions
- def all_of(types):
+ def all_of(types: Union[Type, Tuple[Type, ...]]) -> Callable[[Iterable], bool]:
Review comment:
This is OK, as long as we don't need `[]`. In fact, we'll have to go the other way in the future as `typing` variants having `collections.abc` counterparts are deprecated in 3.9 ([PEP 585](https://www.python.org/dev/peps/pep-0585/))
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725408880
##########
File path: python/pyspark/sql/__init__.pyi
##########
@@ -29,7 +29,7 @@ from pyspark.sql.dataframe import ( # noqa: F401
DataFrameStatFunctions as DataFrameStatFunctions,
)
from pyspark.sql.group import GroupedData as GroupedData # noqa: F401
-from pyspark.sql.observation import Observation # noqa: F401
+from pyspark.sql.observation import Observation as Observation # noqa: F401
Review comment:
If it is in `__all__` of `__init__.py` then it should be here, since we import directly from `pyspark.sql`:
https://github.com/apache/spark/blob/07ecbc4049aa7f8daa11e6a924c37c1db2f53c73/python/pyspark/sql/dataframe.py#L1994
But it won't be necessary once #34203 is merged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939136345
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144037/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940544565
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725457655
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
else:
raise TypeError("numPartitions should be an int or Column")
- def repartitionByRange(self, numPartitions, *cols):
+ @overload
+ def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ @overload
+ def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ def repartitionByRange( # type: ignore[misc]
+ self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"
Review comment:
It is not:
```python
# test.py
from pyspark.sql import DataFrame
reveal_type(DataFrame.repartitionByRange)
```
```
test.py:4: note: Revealed type is "Overload(def (self: pyspark.sql.dataframe.DataFrame, numPartitions: builtins.int, *cols: Union[pyspark.sql.column.Column, builtins.str]) -> pyspark.sql.dataframe.DataFrame, def (self: pyspark.sql.dataframe.DataFrame, *cols: Union[pyspark.sql.column.Column, builtins.str]) -> pyspark.sql.dataframe.DataFrame)"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939117368
**[Test build #144037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144037/testReport)** for PR 34225 at commit [`f366fc8`](https://github.com/apache/spark/commit/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939136361
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48514/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34225:
URL: https://github.com/apache/spark/pull/34225
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725374295
##########
File path: python/pyspark/sql/__init__.pyi
##########
@@ -29,7 +29,7 @@ from pyspark.sql.dataframe import ( # noqa: F401
DataFrameStatFunctions as DataFrameStatFunctions,
)
from pyspark.sql.group import GroupedData as GroupedData # noqa: F401
-from pyspark.sql.observation import Observation # noqa: F401
+from pyspark.sql.observation import Observation as Observation # noqa: F401
Review comment:
What would happen if we don't add `as Observation`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725369409
##########
File path: python/pyspark/__init__.pyi
##########
@@ -71,7 +71,7 @@ def since(version: Union[str, float]) -> Callable[[T], T]: ...
def copy_func(
f: F,
name: Optional[str] = ...,
- sinceversion: Optional[str] = ...,
+ sinceversion: Optional[Union[str, float]] = ...,
Review comment:
Great! My ongoing PR needs the change as well! 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r725381412
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -1017,7 +1082,7 @@ def repartitionByRange(self, numPartitions, *cols):
"""
if isinstance(numPartitions, int):
if len(cols) == 0:
- return ValueError("At least one partition-by expression must be specified.")
+ raise ValueError("At least one partition-by expression must be specified.")
Review comment:
Good catch!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939128848
**[Test build #144037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144037/testReport)** for PR 34225 at commit [`f366fc8`](https://github.com/apache/spark/commit/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726491351
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
else:
raise TypeError("numPartitions should be an int or Column")
- def repartitionByRange(self, numPartitions, *cols):
+ @overload
+ def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ @overload
+ def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ def repartitionByRange( # type: ignore[misc]
+ self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"
Review comment:
In runtime, when we call something like `sdf.repartitionByRange('col1', 'col2', ...)`, the `numPartitions` will be `ColumnOrName`.
In fact, it checks whether it's `str` or `Column`:
https://github.com/apache/spark/blob/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122/python/pyspark/sql/dataframe.py#L1089-L1091
Even though this won't be showed externally, we need it to make `mypy` check the function body properly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940388116
**[Test build #144098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144098/testReport)** for PR 34225 at commit [`5d15bf5`](https://github.com/apache/spark/commit/5d15bf5f5468e320b1c57d7345ce227d693b2080).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class Database(NamedTuple):`
* `class Table(NamedTuple):`
* `class Column(NamedTuple):`
* `class Function(NamedTuple):`
* ` protected class YarnSchedulerEndpoint(override val rpcEnv: RpcEnv)`
* `class IndexAlreadyExistsException(message: String, cause: Option[Throwable] = None)`
* `case class SetCatalogAndNamespace(child: LogicalPlan) extends UnaryCommand `
* `case class SetNamespaceCommand(namespace: Seq[String]) extends LeafRunnableCommand `
* `case class HashedRelationBroadcastMode(key: Seq[Expression], isNullAware: Boolean = false)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726591383
##########
File path: python/pyspark/sql/__init__.pyi
##########
@@ -29,7 +29,7 @@ from pyspark.sql.dataframe import ( # noqa: F401
DataFrameStatFunctions as DataFrameStatFunctions,
)
from pyspark.sql.group import GroupedData as GroupedData # noqa: F401
-from pyspark.sql.observation import Observation # noqa: F401
+from pyspark.sql.observation import Observation as Observation # noqa: F401
Review comment:
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940439797
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48576/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940439797
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48576/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-940428416
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48576/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939155327
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48514/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939156225
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48514/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe
Posted by GitBox <gi...@apache.org>.
ueshin commented on pull request #34225:
URL: https://github.com/apache/spark/pull/34225#issuecomment-939116934
cc @zero323 @xinrong-databricks @HyukjinKwon @itholic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org