You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/10 11:27:42 UTC

[GitHub] [spark] Yikun opened a new pull request, #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Yikun opened a new pull request, #37465:
URL: https://github.com/apache/spark/pull/37465

   ### What changes were proposed in this pull request?
   This PR proposes to improve the examples in `pyspark.sql.types` by making each example self-contained with a brief explanation and a bit more realistic example.
   
   ### Why are the changes needed?
   To make the documentation more readable and able to copy and paste directly in PySpark shell.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, doc
   
   
   ### How was this patch tested?
   Ran each test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #37465:
URL: https://github.com/apache/spark/pull/37465#issuecomment-1211460430

   cc @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a diff in pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #37465:
URL: https://github.com/apache/spark/pull/37465#discussion_r943361589


##########
python/pyspark/sql/types.py:
##########
@@ -2137,7 +2181,9 @@ def _test() -> None:
     sc = SparkContext("local[4]", "PythonTest")
     globs["sc"] = sc

Review Comment:
   Good catch! thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a diff in pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #37465:
URL: https://github.com/apache/spark/pull/37465#discussion_r943080421


##########
python/pyspark/sql/types.py:
##########
@@ -684,6 +702,30 @@ class StructType(DataType):
     ...     StructField("f2", IntegerType(), False)])
     >>> struct1 == struct2
     False
+
+    The below example demonstrates how to create a struct using StructType & StructField on
+    DataFrame:
+
+    >>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])]
+    >>> schema = StructType([
+    ...     StructField("name", StringType()),
+    ...     StructField("languagesSkills", ArrayType(StringType())),
+    ... ])
+    >>> df = spark.createDataFrame(data=data, schema=schema)
+    >>> df.printSchema()
+    root
+     |-- name: string (nullable = true)
+     |-- languagesSkills: array (nullable = true)
+     |    |-- element: string (containsNull = true)
+    <BLANKLINE>

Review Comment:
   ah, good suggestion, it was copy from other existing code. Addressed!
   
   and I will also submit a PR to address other places
   
   ```
   spark git:(type) ✗ grep -r BLANKLINE **/*.py
   python/pyspark/ml/stat.py:    <BLANKLINE>
   python/pyspark/ml/stat.py:    <BLANKLINE>
   python/pyspark/ml/stat.py:    <BLANKLINE>
   python/pyspark/ml/stat.py:    <BLANKLINE>
   python/pyspark/mllib/tree.py:        <BLANKLINE>
   python/pyspark/mllib/tree.py:        <BLANKLINE>
   python/pyspark/mllib/tree.py:        <BLANKLINE>
   python/pyspark/mllib/tree.py:        <BLANKLINE>
   python/pyspark/mllib/tree.py:        <BLANKLINE>
   python/pyspark/pandas/frame.py:        <BLANKLINE>
   python/pyspark/pandas/series.py:        <BLANKLINE>
   python/pyspark/pandas/series.py:        <BLANKLINE>
   python/pyspark/pandas/series.py:        <BLANKLINE>
   python/pyspark/pandas/series.py:        <BLANKLINE>
   python/pyspark/sql/dataframe.py:        <BLANKLINE>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a diff in pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #37465:
URL: https://github.com/apache/spark/pull/37465#discussion_r944005915


##########
python/pyspark/sql/types.py:
##########
@@ -684,6 +702,28 @@ class StructType(DataType):
     ...     StructField("f2", IntegerType(), False)])
     >>> struct1 == struct2
     False
+
+    The below example demonstrates how to create a struct using class:`StructType`
+    and class:`StructField` on DataFrame:

Review Comment:
   Thanks, addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on a diff in pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #37465:
URL: https://github.com/apache/spark/pull/37465#discussion_r943138891


##########
python/pyspark/sql/types.py:
##########
@@ -2137,7 +2181,9 @@ def _test() -> None:
     sc = SparkContext("local[4]", "PythonTest")
     globs["sc"] = sc

Review Comment:
   `sc` is redundant? I don't find where use it. If it's redundant, better to remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37465:
URL: https://github.com/apache/spark/pull/37465#discussion_r943071346


##########
python/pyspark/sql/types.py:
##########
@@ -1108,6 +1153,8 @@ def _parse_datatype_json_string(json_string: str) -> DataType:
 
     Examples
     --------
+    >>> from pyspark.sql.types import *
+    >>> from pyspark.sql.types import _all_atomic_types, _parse_datatype_json_string

Review Comment:
   ditto



##########
python/pyspark/sql/types.py:
##########
@@ -511,6 +519,14 @@ class MapType(DataType):
 
     Examples
     --------
+    >>> from pyspark.sql.types import IntegerType, FloatType, MapType, StringType
+
+    The below example demonstrates how to create MapType:
+
+    >>> map_type = MapType(StringType(), IntegerType())
+
+    The values of the map can contain null (None) values by default:

Review Comment:
   ```suggestion
       The values of the map can contain null (``None``) values by default:
   ```



##########
python/pyspark/sql/types.py:
##########
@@ -511,6 +519,14 @@ class MapType(DataType):
 
     Examples
     --------
+    >>> from pyspark.sql.types import IntegerType, FloatType, MapType, StringType
+
+    The below example demonstrates how to create MapType:

Review Comment:
   ```suggestion
       The below example demonstrates how to create class:`MapType`:
   ```



##########
python/pyspark/sql/types.py:
##########
@@ -1036,6 +1080,7 @@ def _parse_datatype_string(s: str) -> DataType:
 
     Examples
     --------
+    >>> from pyspark.sql.types import _parse_datatype_string

Review Comment:
   This is an internal example so let's just leave it.



##########
python/pyspark/sql/types.py:
##########
@@ -1649,6 +1696,8 @@ def _make_type_verifier(
 
     Examples
     --------
+    >>> from pyspark.sql.types import *
+    >>> from pyspark.sql.types import _make_type_verifier

Review Comment:
   ditto



##########
python/pyspark/sql/types.py:
##########
@@ -684,6 +702,30 @@ class StructType(DataType):
     ...     StructField("f2", IntegerType(), False)])
     >>> struct1 == struct2
     False
+
+    The below example demonstrates how to create a struct using StructType & StructField on
+    DataFrame:
+
+    >>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])]
+    >>> schema = StructType([
+    ...     StructField("name", StringType()),
+    ...     StructField("languagesSkills", ArrayType(StringType())),
+    ... ])
+    >>> df = spark.createDataFrame(data=data, schema=schema)
+    >>> df.printSchema()
+    root
+     |-- name: string (nullable = true)
+     |-- languagesSkills: array (nullable = true)
+     |    |-- element: string (containsNull = true)
+    <BLANKLINE>

Review Comment:
   You can remove this blankline and add fix the bottom as below:
   
   ```
   ...
   optionflags=doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained
URL: https://github.com/apache/spark/pull/37465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-meng commented on a diff in pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
xinrong-meng commented on code in PR #37465:
URL: https://github.com/apache/spark/pull/37465#discussion_r943744049


##########
python/pyspark/sql/types.py:
##########
@@ -684,6 +702,28 @@ class StructType(DataType):
     ...     StructField("f2", IntegerType(), False)])
     >>> struct1 == struct2
     False
+
+    The below example demonstrates how to create a struct using class:`StructType`
+    and class:`StructField` on DataFrame:

Review Comment:
   nit: `create a struct ... on DataFrame` may not be very clear, how about `create a DataFrame based on a struct created using ...`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37465: [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37465:
URL: https://github.com/apache/spark/pull/37465#issuecomment-1212620259

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org