You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/24 20:54:43 UTC

[GitHub] [spark] ueshin opened a new pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

ueshin opened a new pull request #34101:
URL: https://github.com/apache/spark/pull/34101


   ### What changes were proposed in this pull request?
   
   Inlines type hint files under `pyspark/sql/pandas` folder, except for `pyspark/sql/pandas/functions.pyi` and files under `pyspark/sql/pandas/_typing`.
   
   - Since the file contains a lot of overloads, we should revisit and manage it separately.
   - We can't inline files under `pyspark/sql/pandas/_typing` because it includes new syntax for type hints.
   
   ### Why are the changes needed?
   
   Currently there are type hint stub files (`*.pyi`) to show the expected types for functions, but we can also take advantage of static type checking within the functions by inlining the type hints.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926932659


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143613/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34101:
URL: https://github.com/apache/spark/pull/34101#discussion_r717822966



##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -301,27 +314,50 @@ class SparkConversionMixin(object):
     Min-in for the conversion from pandas to Spark. Currently, only :class:`SparkSession`
     can use this class.
     """
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> "DataFrame":
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> "DataFrame":
+        ...
+
+    def createDataFrame(  # type: ignore[misc]

Review comment:
       Since we have `@overload`s, do we have to annotate the original function, here `createDataFrame`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926922191


   **[Test build #143613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143613/testReport)** for PR 34101 at commit [`0a43396`](https://github.com/apache/spark/commit/0a43396ce3da47024db39f27ffcc9f28911cf1ab).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926932417


   **[Test build #143613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143613/testReport)** for PR 34101 at commit [`0a43396`](https://github.com/apache/spark/commit/0a43396ce3da47024db39f27ffcc9f28911cf1ab).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926954350


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926954360


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34101:
URL: https://github.com/apache/spark/pull/34101


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34101:
URL: https://github.com/apache/spark/pull/34101#discussion_r717822966



##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -301,27 +314,50 @@ class SparkConversionMixin(object):
     Min-in for the conversion from pandas to Spark. Currently, only :class:`SparkSession`
     can use this class.
     """
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> "DataFrame":
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> "DataFrame":
+        ...
+
+    def createDataFrame(  # type: ignore[misc]

Review comment:
       Since we have `@overload`s, do we have to annotate the original function, here `createDataFrame`?

##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -301,27 +314,50 @@ class SparkConversionMixin(object):
     Min-in for the conversion from pandas to Spark. Currently, only :class:`SparkSession`
     can use this class.
     """
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> "DataFrame":
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> "DataFrame":
+        ...
+
+    def createDataFrame(  # type: ignore[misc]

Review comment:
       Since we have `@overload`s, do we have to annotate the original function here for `createDataFrame`?
   Can we simply ignore the `[no-untyped-def]` here?

##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -301,27 +314,50 @@ class SparkConversionMixin(object):
     Min-in for the conversion from pandas to Spark. Currently, only :class:`SparkSession`
     can use this class.
     """
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> "DataFrame":
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> "DataFrame":
+        ...
+
+    def createDataFrame(  # type: ignore[misc]

Review comment:
       Since we have `@overload`s, do we have to annotate the original function here for `createDataFrame`?
   Can we simply ignore the `[no-untyped-def]` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34101:
URL: https://github.com/apache/spark/pull/34101#discussion_r717822966



##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -301,27 +314,50 @@ class SparkConversionMixin(object):
     Min-in for the conversion from pandas to Spark. Currently, only :class:`SparkSession`
     can use this class.
     """
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> "DataFrame":
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> "DataFrame":
+        ...
+
+    def createDataFrame(  # type: ignore[misc]

Review comment:
       Since we have `@overload`s, do we have to annotate the original function here for `createDataFrame`?
   Can we simply ignore the `[no-untyped-def]` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-929512362


   LGTM, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926954360


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-929512362


   LGTM, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926938373


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926922191


   **[Test build #143613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143613/testReport)** for PR 34101 at commit [`0a43396`](https://github.com/apache/spark/commit/0a43396ce3da47024db39f27ffcc9f28911cf1ab).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #34101:
URL: https://github.com/apache/spark/pull/34101#discussion_r717822966



##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -301,27 +314,50 @@ class SparkConversionMixin(object):
     Min-in for the conversion from pandas to Spark. Currently, only :class:`SparkSession`
     can use this class.
     """
-    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
+
+    @overload
+    def createDataFrame(
+        self, data: "PandasDataFrameLike", samplingRatio: Optional[float] = ...
+    ) -> "DataFrame":
+        ...
+
+    @overload
+    def createDataFrame(
+        self,
+        data: "PandasDataFrameLike",
+        schema: Union[StructType, str],
+        verifySchema: bool = ...,
+    ) -> "DataFrame":
+        ...
+
+    def createDataFrame(  # type: ignore[misc]

Review comment:
       Since we have `@overload`s, do we have to annotate the original function here for `createDataFrame`?
   Can we simply ignore the `[no-untyped-def]` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-929719860


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34101:
URL: https://github.com/apache/spark/pull/34101


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-926932659


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143613/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34101: [SPARK-36846][PYTHON] Inline most of type hint files under pyspark/sql/pandas folder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34101:
URL: https://github.com/apache/spark/pull/34101#issuecomment-929719860


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org