You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2017/07/03 15:54:00 UTC

[jira] [Commented] (FLINK-6936) Add multiple targets support for custom partitioner

    [ https://issues.apache.org/jira/browse/FLINK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072665#comment-16072665 ] 

Aljoscha Krettek commented on FLINK-6936:
-----------------------------------------

We can add a new interface {{MultiPartitioner}} (or some name to that effect) for that, thus not breaking the API.

I'm wondering whether it makes sense to add this now without knowing whether we can (will?) ever have other features required for making this work well, i.e. state migration.

> Add multiple targets support for custom partitioner
> ---------------------------------------------------
>
>                 Key: FLINK-6936
>                 URL: https://issues.apache.org/jira/browse/FLINK-6936
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Xingcan Cui
>            Assignee: Xingcan Cui
>            Priority: Minor
>
> The current user-facing Partitioner only allows returning one target.
> {code:java}
> @Public
> public interface Partitioner<K> extends java.io.Serializable, Function {
> 	/**
> 	 * Computes the partition for the given key.
> 	 *
> 	 * @param key The key.
> 	 * @param numPartitions The number of partitions to partition into.
> 	 * @return The partition index.
> 	 */
> 	int partition(K key, int numPartitions);
> }
> {code}
> Actually, this function should return multiple partitions and this may be a historical legacy.
> There could be at least three approaches to solve this.
> # Make the `protected DataStream<T> setConnectionType(StreamPartitioner<T> partitioner)` method in DataStream public and that allows users to directly define StreamPartitioner.
> # Change the `partition` method in the Partitioner interface to return an int array instead of a single int value.
> # Add a new `multicast` method to DataStream and provide a MultiPartitioner interface which returns an int array.
> Considering the consistency of API, the 3rd approach seems to be an acceptable choice. [~aljoscha], what do you think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)