You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/13 01:37:20 UTC

[GitHub] [spark] itholic commented on a diff in pull request #37850: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 7, ~30 functions)

itholic commented on code in PR #37850:
URL: https://github.com/apache/spark/pull/37850#discussion_r969057190


##########
python/pyspark/sql/functions.py:
##########
@@ -5619,6 +5708,10 @@ def element_at(col: "ColumnOrName", extraction: Any) -> Column:
     >>> df = spark.createDataFrame([(["a", "b", "c"],)], ['data'])
     >>> df.select(element_at(df.data, 1)).collect()
     [Row(element_at(data, 1)='a')]
+    >>> df.select(element_at(df.data, -1)).collect()
+    [Row(element_at(data, -1)='c')]
+    >>> df.select(element_at(df.data, -4)).collect()
+    [Row(element_at(data, -4)=None)]

Review Comment:
   Can we add a short description why this returns `None`?
   
   e.g.
   
   ```
       >>> df.select(element_at(df.data, -1)).collect()
       [Row(element_at(data, -1)='c')]
   
       Returns `None` if there is no value corresponding to the given `extraction`.
   
       >>> df.select(element_at(df.data, -4)).collect()
       [Row(element_at(data, -4)=None)]
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -6144,6 +6338,11 @@ def schema_of_csv(csv: "ColumnOrName", options: Optional[Dict[str, str]] = None)
 
         .. # noqa
 
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        a string representatio of a :class:`StructType` parsed from given CSV.

Review Comment:
   > typo in a few places: representatio -> representation
   
   Here, too



##########
python/pyspark/sql/functions.py:
##########
@@ -5554,19 +5603,40 @@ def array_join(
 def concat(*cols: "ColumnOrName") -> Column:
     """
     Concatenates multiple input columns together into a single column.
-    The function works with strings, binary and compatible array columns.
+    The function works with strings, numeric, binary and compatible array columns.
+    Or any type that can be converted to string is good candidate as input value.
 
     .. versionadded:: 1.5.0
 
+    Parameters
+    ----------
+    cols : :class:`~pyspark.sql.Column` or str
+        target column or columns to work on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        concatatened values. Type of the `Column` depends on input columns' type.

Review Comment:
   Another typo for "contatatened" here :-)



##########
python/pyspark/sql/functions.py:
##########
@@ -5554,19 +5603,40 @@ def array_join(
 def concat(*cols: "ColumnOrName") -> Column:
     """
     Concatenates multiple input columns together into a single column.
-    The function works with strings, binary and compatible array columns.
+    The function works with strings, numeric, binary and compatible array columns.
+    Or any type that can be converted to string is good candidate as input value.
 
     .. versionadded:: 1.5.0
 
+    Parameters
+    ----------
+    cols : :class:`~pyspark.sql.Column` or str
+        target column or columns to work on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        concatatened values. Type of the `Column` depends on input columns' type.
+
+    See Also
+    --------
+    :meth:`pyspark.sql.functions.array_join` : to concatanate string columns with delimiter

Review Comment:
   ditto



##########
python/pyspark/sql/functions.py:
##########
@@ -5554,19 +5603,40 @@ def array_join(
 def concat(*cols: "ColumnOrName") -> Column:
     """
     Concatenates multiple input columns together into a single column.
-    The function works with strings, binary and compatible array columns.
+    The function works with strings, numeric, binary and compatible array columns.
+    Or any type that can be converted to string is good candidate as input value.

Review Comment:
   If there are supported types other than string, numeric, and binary, can we list them all ?



##########
python/pyspark/sql/functions.py:
##########
@@ -5467,6 +5487,11 @@ def array_contains(col: "ColumnOrName", value: Any) -> Column:
     value :
         value or column to check for in array
 
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        a column of `boolean` type.

Review Comment:
   nit: if we want to use single quotation for type name, why don't we use it in other docstrings ?
   
   e.g. 
   ```diff
   -  a column of array type.
   +  a column of `array` type.
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -5514,6 +5544,11 @@ def slice(
     length : :class:`~pyspark.sql.Column` or str or int
         column name, column, or int containing the length of the slice
 
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        a column of array type. Subset of array.
+

Review Comment:
   nit: since we're here, can we also fix the minor mistake in description?
   
   I found there are two spaces between "containing" and "all".
   
   ```diff
   -  Collection function: returns an array containing  all the elements in `x` from index `start`
   +  Collection function: returns an array containing all the elements in `x` from index `start`
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org