You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xinrong Meng (Jira)" <ji...@apache.org> on 2022/04/27 17:35:00 UTC

[jira] [Created] (SPARK-39048) Refactor GroupBy._reduce_for_stat_function on accepted data types

Xinrong Meng created SPARK-39048:
------------------------------------

             Summary: Refactor GroupBy._reduce_for_stat_function on accepted data types 
                 Key: SPARK-39048
                 URL: https://issues.apache.org/jira/browse/SPARK-39048
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.4.0
            Reporter: Xinrong Meng


`Groupby._reduce_for_stat_function` is a common helper function leveraged by multiple statistical functions of GroupBy objects.

It defines parameters `only_numeric` and `bool_as_numeric` to control accepted Spark types.

To be consistent with pandas API, we may also have to introduce `str_as_numeric` for `sum` for example.

Instead of introducing parameters designated for each Spark type, the PR is proposed to introduce a parameter `accepted_spark_types` to specify accepted types of Spark columns to be aggregated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org