You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/29 02:47:30 UTC

[GitHub] [spark] ueshin opened a new pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

ueshin opened a new pull request #34136:
URL: https://github.com/apache/spark/pull/34136


   ### What changes were proposed in this pull request?
   
   Inline type hints from `python/pyspark/sql/session.pyi` to `python/pyspark/sql/session.py`.
   
   ### Why are the changes needed?
   
   Currently, there is type hint stub files `python/pyspark/sql/session.pyi` to show the expected types for functions, but we can also take advantage of static type checking within the functions by inlining the type hints.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932645923


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48306/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718959247



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Thanks for clarifications!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718906758



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       > @zero323 May I ask you to fix the missing variants?
   
   On it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930492130


   cc @xinrong-databricks @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718948382



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       So wouldn't make more sense to skip annotations here completely? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718907290



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806472


   **[Test build #143698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143698/testReport)** for PR 34136 at commit [`9b4977d`](https://github.com/apache/spark/commit/9b4977dbef9a10c0a09cb11f7aa1c3c7029b6900).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718960289



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Thank YOU for asking! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930615886


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48242/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933891517


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48333/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932641942


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48306/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r719804759



##########
File path: python/pyspark/sql/session.py
##########
@@ -525,22 +584,25 @@ def _createFromLocal(self, data, schema):
         if schema is None or isinstance(schema, (list, tuple)):
             struct = self._inferSchemaFromList(data, names=schema)
             converter = _create_converter(struct)
-            data = map(converter, data)
+            tupled_data = map(converter, data)  # type: Iterable[Tuple]

Review comment:
       I'm curious why use `# type: a_type` rather than `typing.cast`? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937391391


   Let me merge this to go forward but please let me know if there are more things to fix up together @zero323 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937677264


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143922/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930492130


   cc @xinrong-databricks @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471352


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48426/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930601800


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48241/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718842692



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
   
   https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
   
   On a side note, we're missing `schema: str` variants, if I am not mistaken.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930553559


   **[Test build #143730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143730/testReport)** for PR 34136 at commit [`46a0f94`](https://github.com/apache/spark/commit/46a0f94efc5886e3c523b2648f76a17d51cc3f17).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930615886


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48242/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675771


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48444/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48426/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937391574


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

xinrong-databricks commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-931744148


   LGTM, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718937128



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
   
   In general  this one 
   
   https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
   
   was added to support invocations like:
   
   ```python
   spark.createDataFrame([1], IntegerType())
   ```
   but reject
   
   ```python
   spark.createDataFrame([(1, 2)], IntegerType())
   ```
   
   with 
   
   ```
   error: List item 0 has incompatible type "Tuple[int, int]"; expected "Union[date, float, str, Decimal]"
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937391391






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143698/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929823174


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48212/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006


   **[Test build #143698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143698/testReport)** for PR 34136 at commit [`9b4977d`](https://github.com/apache/spark/commit/9b4977dbef9a10c0a09cb11f7aa1c3c7029b6900).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933908728


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48333/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718897441



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       > On a side note, we're missing `schema: str` variants, if I am not mistaken.
   
   @zero323 May I ask you to fix the missing variant?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720405240



##########
File path: python/pyspark/sql/session.py
##########
@@ -492,28 +539,40 @@ def _inferSchema(self, rdd, samplingRatio=None, names=None):
                 prefer_timestamp_ntz=prefer_timestamp_ntz)).reduce(_merge_type)
         return schema
 
-    def _createFromRDD(self, rdd, schema, samplingRatio):
+    def _createFromRDD(
+        self,
+        rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+        schema: Optional[Union[DataType, List[str]]],
+        samplingRatio: Optional[float],
+    ) -> Tuple["RDD[Tuple]", StructType]:

Review comment:
       Following the notes from the above, this could be overloaded to distinguish between cases were we can and cannot infer schema. Might be an overkill, though.
   
   Just a heads-up ‒ I've encountered some problems related to these specific `Unions` while working on SPARK-36894. This surface only with the `self` type (which is, ironically, not validated) and I am thinking about introducing some `TypeVars` (a more precise choice anyway) as a fix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720396254



##########
File path: python/pyspark/sql/session.py
##########
@@ -107,10 +126,23 @@ class Builder(object):
         """
 
         _lock = RLock()
-        _options = {}
+        _options = {}  # type: Dict[str, Any]

Review comment:
       Wouldn't be better to use PEP 526 annotations here? 
   
   ```python
   _options: Dict[str, Any] = {}
   ```
   
   I doesn't seem we're going to backport hints directly any more, and we're already in Python 3.6 and beyond here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930584193


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48242/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937677264


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143922/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

HyukjinKwon closed pull request #34136:
URL: https://github.com/apache/spark/pull/34136


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718879032



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       Ah, cool. I missed there are the annotations in `rdd.pyi`.
   I guess we can just mark it `@no_type_check` here for now. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718951599



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       If we remove the annotations, `mypy` won't check the function body.
   To make `mypy` check the function body is the purpose of this series of PRs, then we can more easily catch the misuse of variables.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718879032



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       Ah, cool. I missed there are the annotations in `rdd.pyi`.
   I guess we can just mark it `@no_type_check` here. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933837932


   **[Test build #143820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143820/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932645923


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48306/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932613772


   **[Test build #143794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143794/testReport)** for PR 34136 at commit [`fd48809`](https://github.com/apache/spark/commit/fd48809b59ea5134d4cc114545c086cb650cc906).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932621658


   **[Test build #143794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143794/testReport)** for PR 34136 at commit [`fd48809`](https://github.com/apache/spark/commit/fd48809b59ea5134d4cc114545c086cb650cc906).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929800342


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48212/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-938019872


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48469/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937640599


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48444/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675212


   **[Test build #143922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
    * This patch passes all tests.
    * This patch **does not merge cleanly**.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-938019872


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48469/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937965518


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48469/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930612750


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48242/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720653480



##########
File path: python/pyspark/sql/session.py
##########
@@ -492,28 +539,40 @@ def _inferSchema(self, rdd, samplingRatio=None, names=None):
                 prefer_timestamp_ntz=prefer_timestamp_ntz)).reduce(_merge_type)
         return schema
 
-    def _createFromRDD(self, rdd, schema, samplingRatio):
+    def _createFromRDD(
+        self,
+        rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+        schema: Optional[Union[DataType, List[str]]],
+        samplingRatio: Optional[float],
+    ) -> Tuple["RDD[Tuple]", StructType]:

Review comment:
       > Sure, I'll wait for it and use the TypeVar here and the above.
   
   Oh, I didn't mean that. If any changes are needed later, I'll handle it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r721676322



##########
File path: python/pyspark/sql/session.py
##########
@@ -445,7 +487,12 @@ def _inferSchemaFromList(self, data, names=None):
             raise ValueError("Some of types cannot be determined after inferring")
         return schema
 
-    def _inferSchema(self, rdd, samplingRatio=None, names=None):
+    def _inferSchema(
+        self,
+        rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",

Review comment:
       Should we use `Any` from `createDataFrame` then?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675818


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48444/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718937128



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
   
   In general  this one 
   
   https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
   
   was added to support invocations like:
   
   ```python
   spark.createDataFrame([1], IntegerType())
   ```
   but reject
   
   ```python
   spark.createDataFrame([(1, 2)], IntegerType())
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930592048


   **[Test build #143731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143731/testReport)** for PR 34136 at commit [`240280c`](https://github.com/apache/spark/commit/240280c87efe63868da4b2cf1a66c1655bf4d08f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718948382



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       My bad, but wouldn't make more sense to skip annotations here completely? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720567537



##########
File path: python/pyspark/sql/session.py
##########
@@ -492,28 +539,40 @@ def _inferSchema(self, rdd, samplingRatio=None, names=None):
                 prefer_timestamp_ntz=prefer_timestamp_ntz)).reduce(_merge_type)
         return schema
 
-    def _createFromRDD(self, rdd, schema, samplingRatio):
+    def _createFromRDD(
+        self,
+        rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+        schema: Optional[Union[DataType, List[str]]],
+        samplingRatio: Optional[float],
+    ) -> Tuple["RDD[Tuple]", StructType]:

Review comment:
       Sure, I'll wait for it and use the `TypeVar` here and the above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718945805



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       For overloaded functions, the actual function that has the function body is not exposed to the type checking libraries.
   So the type checking libraries should still raise such an error.
   
   The type hints for the actual function are purely for mypy to check the function body.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720401238



##########
File path: python/pyspark/sql/session.py
##########
@@ -445,7 +487,12 @@ def _inferSchemaFromList(self, data, names=None):
             raise ValueError("Some of types cannot be determined after inferring")
         return schema
 
-    def _inferSchema(self, rdd, samplingRatio=None, names=None):
+    def _inferSchema(
+        self,
+        rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",

Review comment:
       Just wondering about this ‒ I have a feeling that it should be either `RDD[Any]` (type-wise we can invoke this on arbitrary RDD) or, if we want to give a signal that can succeed  only on certain types of RDDs,  `Literal*` variants should be omitted (we don't support schema inference on these).
   
   Same applies to `_inferSchemaFromList`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932631981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143794/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930601839


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48241/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930553559






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937478707


   **[Test build #143922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-938010616


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48469/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937478707


   **[Test build #143922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933876331


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143820/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937675818


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48444/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937442709


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48426/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929806723


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143698/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937478707


   **[Test build #143922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143922/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932613772


   **[Test build #143794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143794/testReport)** for PR 34136 at commit [`fd48809`](https://github.com/apache/spark/commit/fd48809b59ea5134d4cc114545c086cb650cc906).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930592377


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143731/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932631981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143794/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933855616


   **[Test build #143820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143820/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933908728


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48333/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930591151


   **[Test build #143730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143730/testReport)** for PR 34136 at commit [`46a0f94`](https://github.com/apache/spark/commit/46a0f94efc5886e3c523b2648f76a17d51cc3f17).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930601839


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48241/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929823157


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48212/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929823174


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48212/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718897441



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       > On a side note, we're missing `schema: str` variants, if I am not mistaken.
   
   @zero323 May I ask you to fix the missing variants?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937442709






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

HyukjinKwon closed pull request #34136:
URL: https://github.com/apache/spark/pull/34136


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-937471369


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48426/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720567235



##########
File path: python/pyspark/sql/session.py
##########
@@ -525,22 +584,25 @@ def _createFromLocal(self, data, schema):
         if schema is None or isinstance(schema, (list, tuple)):
             struct = self._inferSchemaFromList(data, names=schema)
             converter = _create_converter(struct)
-            data = map(converter, data)
+            tupled_data = map(converter, data)  # type: Iterable[Tuple]

Review comment:
       Updated with PEP 526 annotations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933837932


   **[Test build #143820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143820/testReport)** for PR 34136 at commit [`794fc0d`](https://github.com/apache/spark/commit/794fc0d142f257daa19dfe2a6e4a2cd21f26f3d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933876331


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143820/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-933866322


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48333/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-932626863


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48306/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718842692



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
   
   https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
   
   On a side note, were missing `schema: str` variants, if I am not mistaken.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718951599



##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       If we remove the annotations, `mypy` won't check the function body.
   To make `mypy` check the function body is one of the purposes of this series of PRs, then we can more easily catch the misuse of variables.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-929794006


   **[Test build #143698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143698/testReport)** for PR 34136 at commit [`9b4977d`](https://github.com/apache/spark/commit/9b4977dbef9a10c0a09cb11f7aa1c3c7029b6900).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718842692



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
   
   https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
   
   On a side note, were missing `schema: str` variants, if I am not mistaken.

##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       In general, I am not sure if it makes sense to annotate this here. But if we do, it should be consistent with its RDD counterpart
   
   https://github.com/apache/spark/blob/e79dd89cf6b513264d8205df1d4561cb07406d79/python/pyspark/rdd.pyi#L445-L452
   
   On a side note, we're missing `schema: str` variants, if I am not mistaken.

##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       > @zero323 May I ask you to fix the missing variants?
   
   On it

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
   
   In general  this one 
   
   https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
   
   was added to support invocations like:
   
   ```python
   spark.createDataFrame([1], IntegerType())
   ```
   but reject
   
   ```python
   spark.createDataFrame([(1, 2)], IntegerType())
   ```
   

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Would you mind explaining what is the intention here? Adding `RowLike` to supported type params and `StructType` to supported schemas seems to miss the point of having this annotation (I assume ignore is due to overlap with previous annotations).
   
   In general  this one 
   
   https://github.com/apache/spark/blob/aa9064ad96ff7cefaa4381e912608b0b0d39a09c/python/pyspark/sql/session.pyi#L89-L97
   
   was added to support invocations like:
   
   ```python
   spark.createDataFrame([1], IntegerType())
   ```
   but reject
   
   ```python
   spark.createDataFrame([(1, 2)], IntegerType())
   ```
   
   with 
   
   ```
   error: List item 0 has incompatible type "Tuple[int, int]"; expected "Union[date, float, str, Decimal]"
   ```
   

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       So wouldn't make more sense to skip annotations here completely? 

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       My bad, but wouldn't make more sense to skip annotations here completely? 

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Thanks for clarifications!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r718879032



##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       Ah, cool. I missed there are the annotations in `rdd.pyi`.
   I guess we can just mark it `@no_type_check` here. Thanks!

##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       Ah, cool. I missed there are the annotations in `rdd.pyi`.
   I guess we can just mark it `@no_type_check` here for now. Thanks!

##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       > On a side note, we're missing `schema: str` variants, if I am not mistaken.
   
   @zero323 May I ask you to fix the missing variant?

##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       > On a side note, we're missing `schema: str` variants, if I am not mistaken.
   
   @zero323 May I ask you to fix the missing variants?

##########
File path: python/pyspark/sql/session.py
##########
@@ -19,24 +19,46 @@
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+    Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+    cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+    AtomicType, DataType, StructType,
+    _make_type_verifier, _infer_schema, _has_nulltype, _merge_type, _create_converter,
     _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+    from pyspark.sql._typing import DateTimeLiteral, LiteralType, DecimalLiteral, RowLike
+    from pyspark.sql.catalog import Catalog
+    from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+    from pyspark.sql.streaming import StreamingQueryManager
+    from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
-    def toDF(self, schema=None, sampleRatio=None):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+    def toDF(
+        self: "RDD[RowLike]",
+        schema: Optional[Union[List[str], Tuple[str, ...]]] = None,

Review comment:
       Thanks!

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       For overloaded functions, the actual function that has the function body is not exposed to the type checking libraries.
   So the type checking libraries should still raise such an error.
   
   The type hints for the actual function are purely for mypy to check the function body.

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       If we remove the annotations, `mypy` won't check the function body.
   To make `mypy` check the function body is the purpose of this series of PRs, then we can more easily catch the misuse of variables.

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       If we remove the annotations, `mypy` won't check the function body.
   To make `mypy` check the function body is one of the purposes of this series of PRs, then we can more easily catch the misuse of variables.

##########
File path: python/pyspark/sql/session.py
##########
@@ -566,7 +629,70 @@ def _create_shell_session():
 
         return SparkSession.builder.getOrCreate()
 
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        samplingRatio: Optional[float] = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[List[str], Tuple[str, ...]] = ...,
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral"]],
+        ],
+        schema: Union[AtomicType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: Union["RDD[RowLike]", Iterable["RowLike"]],
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> DataFrame:
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> DataFrame:
+        ...
+
+    def createDataFrame(  # type: ignore[misc]
+        self,
+        data: Union[
+            "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",
+            Iterable[Union["DateTimeLiteral", "LiteralType", "DecimalLiteral", "RowLike"]],
+            "PandasDataFrameLike",
+        ],
+        schema: Optional[Union[AtomicType, StructType, str]] = None,
+        samplingRatio: Optional[float] = None,
+        verifySchema: bool = True
+    ) -> DataFrame:

Review comment:
       Thank YOU for asking! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930591482






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930591482


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143730/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930557818


   **[Test build #143731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143731/testReport)** for PR 34136 at commit [`240280c`](https://github.com/apache/spark/commit/240280c87efe63868da4b2cf1a66c1655bf4d08f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

ueshin commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r721676322



##########
File path: python/pyspark/sql/session.py
##########
@@ -445,7 +487,12 @@ def _inferSchemaFromList(self, data, names=None):
             raise ValueError("Some of types cannot be determined after inferring")
         return schema
 
-    def _inferSchema(self, rdd, samplingRatio=None, names=None):
+    def _inferSchema(
+        self,
+        rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, RowLike]]",

Review comment:
       Should we use `Any` from `createDataFrame` then?
   I mean, for `createDataFrame`, `_create_dataframe`, `_createFromRDD`, and `_createFromLocal` as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34136: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34136:
URL: https://github.com/apache/spark/pull/34136#issuecomment-930580052


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48241/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org