You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2015/06/22 12:27:00 UTC

[jira] [Commented] (FLINK-2193) Partial shuffling

    [ https://issues.apache.org/jira/browse/FLINK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595725#comment-14595725 ] 

Stephan Ewen commented on FLINK-2193:
-------------------------------------

I think there is currently no partition function that can access the runtime context.

For now, this seems a bit like a very special case. Can you simulate this by computing the target partition in a {{RichMapFunction}} and then use an identity-Partitioner to decide the target channel?

One issue that you will encounter is that the i-th receiver has no affinity to the location of the i-th sender, because to the system it look like every sender task talks with every receiver task, and it does not know that you intend to send 95% of all data through one channel.

> Partial shuffling
> -----------------
>
>                 Key: FLINK-2193
>                 URL: https://issues.apache.org/jira/browse/FLINK-2193
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Sebastian Kruse
>            Priority: Minor
>
> In some cases, it would come in handy to shuffle only some specific elements of a dataset instead of all elements. This is currently not achievable with a custom partitioner.
> Use cases for such a feature are:
> * Load balancing: split up elements that require high processing load and distribute the splits among all task managers.
> * Evolutionary algorithms: A well-suited EA model for Map/Reduce-like platforms is the island model, where each worker maintains and evolves its own population. From time to time, individuals among the population need to be exchanged. Shuffling all the complete populations is not necessary, though.
> A presumably easy way to achieve this feature could be to provide the local partition number in deployed partitioners, similar to {{RichFunction#getRuntimeContext()#getIndexOfThisSubtask()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)