You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2016/02/12 21:56:15 UTC

GroupedDataset flatMapGroups with sorting (aka secondary sort redux)

is there a way to leverage the shuffle in Dataset/GroupedDataset so that
Iterator[V] in flatMapGroups has a well defined ordering?

is hard for me to see many good use cases for flatMapGroups and mapGroups
if you do not have sorting.

since spark has a sort based shuffle not exposing this would be a missed
opportunity, not unlike SPARK-3655
<https://issues.apache.org/jira/browse/SPARK-3655>. And unlike with RDD
where this could be implemented in an external library without too much
trouble, i think with Dataset it is hard for a spark "user" to add this
functionality.