You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiang Gao (JIRA)" <ji...@apache.org> on 2016/08/22 13:21:20 UTC

[jira] [Created] (SPARK-17185) Unify naming of API for RDD and Dataset

Xiang Gao created SPARK-17185:
---------------------------------

             Summary: Unify naming of API for RDD and Dataset
                 Key: SPARK-17185
                 URL: https://issues.apache.org/jira/browse/SPARK-17185
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core, SQL
            Reporter: Xiang Gao


In RDD, groupByKey is used to generate a key-list pair and  aggregateByKey is used to do aggregation.
In Dataset, aggregation is done by groupBy and groupByKey, and no API for key-list pair is provided.

The same name "groupBy" is designed to do different things and this might be be confusing. Besides, it would be more convenient to provide API to generate key-list pair for Dataset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org