You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Sebastian Kruse (JIRA)" <ji...@apache.org> on 2015/06/10 09:32:00 UTC

[jira] [Created] (FLINK-2193) Partial shuffling

Sebastian Kruse created FLINK-2193:
--------------------------------------

             Summary: Partial shuffling
                 Key: FLINK-2193
                 URL: https://issues.apache.org/jira/browse/FLINK-2193
             Project: Flink
          Issue Type: Improvement
            Reporter: Sebastian Kruse
            Priority: Minor


In some cases, it would come in handy to shuffle only some specific elements of a dataset instead of all elements. This is currently not achievable with a custom partitioner.

Use cases for such a feature are:
* Load balancing: split up elements that require high processing load and distribute the splits among all task managers.
* Evolutionary algorithms: A well-suited EA model for Map/Reduce-like platforms is the island model, where each worker maintains and evolves its own population. From time to time, individuals among the population need to be exchanged. Shuffling all the complete populations is not necessary, though.

A presumably easy way to achieve this feature could be to provide the local partition number in deployed partitioners, similar to {{RichFunction#getRuntimeContext()#getIndexOfThisSubtask()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)