You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/29 02:47:30 UTC
[GitHub] [spark] ueshin opened a new pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
ueshin opened a new pull request #34136:
URL: https://github.com/apache/spark/pull/34136
### What changes were proposed in this pull request?
Inline type hints from `python/pyspark/sql/session.pyi` to `python/pyspark/sql/session.py`.
### Why are the changes needed?
Currently, there is type hint stub files `python/pyspark/sql/session.pyi` to show the expected types for functions, but we can also take advantage of static type checking within the functions by inlining the type hints.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932645923
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48306/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718959247
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Thanks for clarifications!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718906758
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
> @zero323 May I ask you to fix the missing variants?
On it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930492130
cc @xinrong-databricks @HyukjinKwon @itholic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718948382
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
So wouldn't make more sense to skip annotations here completely?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718907290
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806472
**[Test build #143698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143698/testReport)** for PR 34136 at commit [`9b4977d`](https://github.com/apache/spark/commit/9b4977dbef9a10c0a09cb11f7aa1c3c7029b6900).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718960289
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Thank YOU for asking!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930615886
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48242/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933891517
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48333/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932641942
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48306/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r719804759
##########
File path: python/pyspark/sql/session.py
##########
@@ -525,22 +584,25 @@ def _createFromLocal(self, data, schema):
if schema is None or isinstance(schema, (list, tuple)):
struct = self._inferSchemaFromList(data, names=schema)
converter = _create_converter(struct)
- data = map(converter, data)
+ tupled_data = map(converter, data) # type: Iterable[Tuple]
Review comment:
I'm curious why use `# type: a_type` rather than `typing.cast`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937391391
Let me merge this to go forward but please let me know if there are more things to fix up together @zero323 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937677264
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143922/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930492130
cc @xinrong-databricks @HyukjinKwon @itholic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471352
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48426/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930601800
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48241/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718842692
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
On a side note, we're missing `schema: str` variants, if I am not mistaken.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930553559
**[Test build #143730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143730/testReport)** for PR 34136 at commit [`46a0f94`](https://github.com/apache/spark/commit/46a0f94efc5886e3c523b2648f76a17d51cc3f17).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930615886
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48242/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675771
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48444/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48426/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937391574
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-931744148
LGTM, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718937128
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
In general this one
https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
was added to support invocations like:
```python
spark.createDataFrame([1], IntegerType())
```
but reject
```python
spark.createDataFrame([(1, 2)], IntegerType())
```
with
```
error: List item 0 has incompatible type "Tuple[int, int]"; expected "Union[date, float, str, Decimal]"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937391391
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143698/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929823174
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48212/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006
**[Test build #143698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143698/testReport)** for PR 34136 at commit [`9b4977d`](https://github.com/apache/spark/commit/9b4977dbef9a10c0a09cb11f7aa1c3c7029b6900).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933908728
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48333/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718897441
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
> On a side note, we're missing `schema: str` variants, if I am not mistaken.
@zero323 May I ask you to fix the missing variant?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720405240
##########
File path: python/pyspark/sql/session.py
##########
@@ -492,28 +539,40 @@ def _inferSchema(self, rdd, samplingRatio=None, names=None):
prefer_timestamp_ntz=prefer_timestamp_ntz)).reduce(_merge_type)
return schema
- def _createFromRDD(self, rdd, schema, samplingRatio):
+ def _createFromRDD(
+ self,
+ rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ schema: Optional[Union[DataType, List[str]]],
+ samplingRatio: Optional[float],
+ ) -> Tuple["RDD[Tuple]", StructType]:
Review comment:
Following the notes from the above, this could be overloaded to distinguish between cases were we can and cannot infer schema. Might be an overkill, though.
Just a heads-up ‒ I've encountered some problems related to these specific `Unions` while working on SPARK-36894. This surface only with the `self` type (which is, ironically, not validated) and I am thinking about introducing some `TypeVars` (a more precise choice anyway) as a fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720396254
##########
File path: python/pyspark/sql/session.py
##########
@@ -107,10 +126,23 @@ class Builder(object):
"""
_lock = RLock()
- _options = {}
+ _options = {} # type: Dict[str, Any]
Review comment:
Wouldn't be better to use PEP 526 annotations here?
```python
_options: Dict[str, Any] = {}
```
I doesn't seem we're going to backport hints directly any more, and we're already in Python 3.6 and beyond here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930584193
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48242/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937677264
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143922/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34136:
URL: https://github.com/apache/spark/pull/34136
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718879032
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
Ah, cool. I missed there are the annotations in `rdd.pyi`.
I guess we can just mark it `@no_type_check` here for now. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718951599
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
If we remove the annotations, `mypy` won't check the function body.
To make `mypy` check the function body is the purpose of this series of PRs, then we can more easily catch the misuse of variables.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718879032
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
Ah, cool. I missed there are the annotations in `rdd.pyi`.
I guess we can just mark it `@no_type_check` here. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933837932
**[Test build #143820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143820/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932645923
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48306/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932613772
**[Test build #143794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143794/testReport)** for PR 34136 at commit [`fd48809`](https://github.com/apache/spark/commit/fd48809b59ea5134d4cc114545c086cb650cc906).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932621658
**[Test build #143794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143794/testReport)** for PR 34136 at commit [`fd48809`](https://github.com/apache/spark/commit/fd48809b59ea5134d4cc114545c086cb650cc906).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929800342
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48212/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-938019872
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48469/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937640599
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48444/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675212
**[Test build #143922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
* This patch passes all tests.
* This patch **does not merge cleanly**.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-938019872
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48469/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937965518
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48469/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930612750
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48242/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720653480
##########
File path: python/pyspark/sql/session.py
##########
@@ -492,28 +539,40 @@ def _inferSchema(self, rdd, samplingRatio=None, names=None):
prefer_timestamp_ntz=prefer_timestamp_ntz)).reduce(_merge_type)
return schema
- def _createFromRDD(self, rdd, schema, samplingRatio):
+ def _createFromRDD(
+ self,
+ rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ schema: Optional[Union[DataType, List[str]]],
+ samplingRatio: Optional[float],
+ ) -> Tuple["RDD[Tuple]", StructType]:
Review comment:
> Sure, I'll wait for it and use the TypeVar here and the above.
Oh, I didn't mean that. If any changes are needed later, I'll handle it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r721676322
##########
File path: python/pyspark/sql/session.py
##########
@@ -445,7 +487,12 @@ def _inferSchemaFromList(self, data, names=None):
raise ValueError("Some of types cannot be determined after inferring")
return schema
- def _inferSchema(self, rdd, samplingRatio=None, names=None):
+ def _inferSchema(
+ self,
+ rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
Review comment:
Should we use `Any` from `createDataFrame` then?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675818
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48444/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718937128
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
In general this one
https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
was added to support invocations like:
```python
spark.createDataFrame([1], IntegerType())
```
but reject
```python
spark.createDataFrame([(1, 2)], IntegerType())
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930592048
**[Test build #143731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143731/testReport)** for PR 34136 at commit [`240280c`](https://github.com/apache/spark/commit/240280c87efe63868da4b2cf1a66c1655bf4d08f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718948382
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
My bad, but wouldn't make more sense to skip annotations here completely?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720567537
##########
File path: python/pyspark/sql/session.py
##########
@@ -492,28 +539,40 @@ def _inferSchema(self, rdd, samplingRatio=None, names=None):
prefer_timestamp_ntz=prefer_timestamp_ntz)).reduce(_merge_type)
return schema
- def _createFromRDD(self, rdd, schema, samplingRatio):
+ def _createFromRDD(
+ self,
+ rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ schema: Optional[Union[DataType, List[str]]],
+ samplingRatio: Optional[float],
+ ) -> Tuple["RDD[Tuple]", StructType]:
Review comment:
Sure, I'll wait for it and use the `TypeVar` here and the above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718945805
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
For overloaded functions, the actual function that has the function body is not exposed to the type checking libraries.
So the type checking libraries should still raise such an error.
The type hints for the actual function are purely for mypy to check the function body.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720401238
##########
File path: python/pyspark/sql/session.py
##########
@@ -445,7 +487,12 @@ def _inferSchemaFromList(self, data, names=None):
raise ValueError("Some of types cannot be determined after inferring")
return schema
- def _inferSchema(self, rdd, samplingRatio=None, names=None):
+ def _inferSchema(
+ self,
+ rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
Review comment:
Just wondering about this ‒ I have a feeling that it should be either `RDD[Any]` (type-wise we can invoke this on arbitrary RDD) or, if we want to give a signal that can succeed only on certain types of RDDs, `Literal*` variants should be omitted (we don't support schema inference on these).
Same applies to `_inferSchemaFromList`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932631981
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143794/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930601839
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48241/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930553559
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937478707
**[Test build #143922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-938010616
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48469/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937478707
**[Test build #143922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933876331
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143820/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675818
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48444/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937442709
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48426/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143698/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937478707
**[Test build #143922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932613772
**[Test build #143794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143794/testReport)** for PR 34136 at commit [`fd48809`](https://github.com/apache/spark/commit/fd48809b59ea5134d4cc114545c086cb650cc906).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930592377
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143731/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932631981
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143794/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933855616
**[Test build #143820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143820/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933908728
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48333/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930591151
**[Test build #143730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143730/testReport)** for PR 34136 at commit [`46a0f94`](https://github.com/apache/spark/commit/46a0f94efc5886e3c523b2648f76a17d51cc3f17).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930601839
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48241/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929823157
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48212/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929823174
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48212/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718897441
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
> On a side note, we're missing `schema: str` variants, if I am not mistaken.
@zero323 May I ask you to fix the missing variants?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937442709
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34136:
URL: https://github.com/apache/spark/pull/34136
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48426/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720567235
##########
File path: python/pyspark/sql/session.py
##########
@@ -525,22 +584,25 @@ def _createFromLocal(self, data, schema):
if schema is None or isinstance(schema, (list, tuple)):
struct = self._inferSchemaFromList(data, names=schema)
converter = _create_converter(struct)
- data = map(converter, data)
+ tupled_data = map(converter, data) # type: Iterable[Tuple]
Review comment:
Updated with PEP 526 annotations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933837932
**[Test build #143820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143820/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933876331
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143820/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933866322
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48333/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932626863
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48306/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718842692
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
On a side note, were missing `schema: str` variants, if I am not mistaken.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718951599
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
If we remove the annotations, `mypy` won't check the function body.
To make `mypy` check the function body is one of the purposes of this series of PRs, then we can more easily catch the misuse of variables.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006
**[Test build #143698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143698/testReport)** for PR 34136 at commit [`9b4977d`](https://github.com/apache/spark/commit/9b4977dbef9a10c0a09cb11f7aa1c3c7029b6900).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718842692
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
On a side note, were missing `schema: str` variants, if I am not mistaken.
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
On a side note, we're missing `schema: str` variants, if I am not mistaken.
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
> @zero323 May I ask you to fix the missing variants?
On it
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
In general this one
https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
was added to support invocations like:
```python
spark.createDataFrame([1], IntegerType())
```
but reject
```python
spark.createDataFrame([(1, 2)], IntegerType())
```
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
In general this one
https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
was added to support invocations like:
```python
spark.createDataFrame([1], IntegerType())
```
but reject
```python
spark.createDataFrame([(1, 2)], IntegerType())
```
with
```
error: List item 0 has incompatible type "Tuple[int, int]"; expected "Union[date, float, str, Decimal]"
```
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
So wouldn't make more sense to skip annotations here completely?
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
My bad, but wouldn't make more sense to skip annotations here completely?
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Thanks for clarifications!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718879032
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
Ah, cool. I missed there are the annotations in `rdd.pyi`.
I guess we can just mark it `@no_type_check` here. Thanks!
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
Ah, cool. I missed there are the annotations in `rdd.pyi`.
I guess we can just mark it `@no_type_check` here for now. Thanks!
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
> On a side note, we're missing `schema: str` variants, if I am not mistaken.
@zero323 May I ask you to fix the missing variant?
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
> On a side note, we're missing `schema: str` variants, if I am not mistaken.
@zero323 May I ask you to fix the missing variants?
##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
import warnings
from functools import reduce
from threading import RLock
+from types import TracebackType
+from typing import (
+ Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+ cast, no_type_check, overload, TYPE_CHECKING
+)
-from pyspark import since
+from py4j.java_gateway import JavaObject # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
from pyspark.rdd import RDD
from pyspark.sql.conf import RuntimeConfig
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.pandas.conversion import SparkConversionMixin
from pyspark.sql.readwriter import DataFrameReader
from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
- _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import ( # type: ignore[attr-defined]
+ AtomicType, DataType, StructType,
+ _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
_parse_datatype_string
+)
from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
+if TYPE_CHECKING:
+ from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+ from pyspark.sql.catalog import Catalog
+ from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+ from pyspark.sql.streaming import StreamingQueryManager
+ from pyspark.sql.udf import UDFRegistration
+
+
__all__ = ["SparkSession"]
-def _monkey_patch_RDD(sparkSession):
- def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+ def toDF(
+ self: "RDD[RowLike]",
+ schema: Optional[Union[List[str], Tuple[str, ...]]] = None,
Review comment:
Thanks!
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
For overloaded functions, the actual function that has the function body is not exposed to the type checking libraries.
So the type checking libraries should still raise such an error.
The type hints for the actual function are purely for mypy to check the function body.
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
If we remove the annotations, `mypy` won't check the function body.
To make `mypy` check the function body is the purpose of this series of PRs, then we can more easily catch the misuse of variables.
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
If we remove the annotations, `mypy` won't check the function body.
To make `mypy` check the function body is one of the purposes of this series of PRs, then we can more easily catch the misuse of variables.
##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
return SparkSession.builder.getOrCreate()
- def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ samplingRatio: Optional[float] = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[List[str], Tuple[str, ...]] = ...,
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+ ],
+ schema: Union[AtomicType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: Union["RDD[RowLike]", Iterable["RowLike"]],
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+ ) -> DataFrame:
+ ...
+
+ @overload
+ def createDataFrame(
+ self,
+ data: "PandasDataFrameLike",
+ schema: Union[StructType, str],
+ verifySchema: bool = ...,
+ ) -> DataFrame:
+ ...
+
+ def createDataFrame( # type: ignore[misc]
+ self,
+ data: Union[
+ "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+ Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+ "PandasDataFrameLike",
+ ],
+ schema: Optional[Union[AtomicType, StructType, str]] = None,
+ samplingRatio: Optional[float] = None,
+ verifySchema: bool = True
+ ) -> DataFrame:
Review comment:
Thank YOU for asking!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930591482
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930591482
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143730/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930557818
**[Test build #143731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143731/testReport)** for PR 34136 at commit [`240280c`](https://github.com/apache/spark/commit/240280c87efe63868da4b2cf1a66c1655bf4d08f).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r721676322
##########
File path: python/pyspark/sql/session.py
##########
@@ -445,7 +487,12 @@ def _inferSchemaFromList(self, data, names=None):
raise ValueError("Some of types cannot be determined after inferring")
return schema
- def _inferSchema(self, rdd, samplingRatio=None, names=None):
+ def _inferSchema(
+ self,
+ rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
Review comment:
Should we use `Any` from `createDataFrame` then?
I mean, for `createDataFrame`, `_create_dataframe`, `_createFromRDD`, and `_createFromLocal` as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930580052
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48241/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org