You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/07 14:22:06 UTC

[GitHub] [spark] Yikun commented on a diff in pull request #37816: [SPARK-40332][PS] Implement `GroupBy.quantile`

Yikun commented on code in PR #37816:
URL: https://github.com/apache/spark/pull/37816#discussion_r964909876


##########
python/pyspark/pandas/groupby.py:
##########
@@ -581,6 +581,56 @@ def mean(self, numeric_only: Optional[bool] = True) -> FrameLike:
             F.mean, accepted_spark_types=(NumericType,), bool_to_numeric=True
         )
 
+    # TODO: 'q' accepts list like type
+    def quantile(self, q: float = 0.5) -> FrameLike:
+        """
+        Return group values at the given quantile.
+
+        .. note:: `quantile` in pandas-on-Spark are using distributed percentile approximation
+        algorithm unlike pandas, the result might different with pandas in accuracy, also
+        `interpolation` parameters are not supported yet.
+
+        Parameters
+        ----------
+        q : float, default 0.5 (50% quantile)
+            Value between 0 and 1 providing the quantile to compute.
+
+            .. versionadded:: 3.4.0
+
+        Returns
+        -------
+        pyspark.pandas.Series or pyspark.pandas.DataFrame
+
+        See Also
+        --------
+        pyspark.pandas.Series.quantile
+        pyspark.pandas.DataFrame.quantile
+        pyspark.sql.functions.percentile_approx
+
+        Examples
+        --------
+        >>> df = ps.DataFrame([
+        ...     ['a', 1], ['a', 2], ['a', 3],
+        ...     ['b', 1], ['b', 3], ['b', 5]
+        ... ], columns=['key', 'val'])
+
+        Groupby one column and return the quantile of the remaining columns in
+        each group.
+
+        >>> df.groupby('key').quantile()
+             val
+        key
+        a    2
+        b    3
+        """
+        if is_list_like(q):

Review Comment:
   pandas also support this, or did I missing something?
   
   https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.quantile.html#pandas-core-groupby-dataframegroupby-quantile



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org