You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Tobias (JIRA)" <ji...@apache.org> on 2014/06/27 11:19:24 UTC

[jira] [Commented] (FLINK-925) Support KeySelector function returning Tuples

    [ https://issues.apache.org/jira/browse/FLINK-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045743#comment-14045743 ] 

Tobias commented on FLINK-925:
------------------------------

DataSet implements: 
public Grouping<T> groupBy(int... fields) {
		return new Grouping<T>(this, new Keys.FieldPositionKeys<T>(fields, getType(), false));
	}
That can be used to group not comparable Tuple data types. Those Tuples need consist of non generic comparable types.

When I group on my comparable:
*DataSet<Tuple2<MyComparable, Integer>>.groupBy(0)*
This exception is thrown:
{color:red}
Exception in thread "main" java.lang.UnsupportedOperationException: Generic type comparators are not yet implemented.
	at eu.stratosphere.api.java.typeutils.GenericTypeInfo.createComparator(GenericTypeInfo.java:66)
{color}

When I group on the Integer:
*DataSet<Tuple2<MyComparable, Integer>>.groupBy(1)*
{color:red}
Exception in thread "main" eu.stratosphere.compiler.CompilerException: Error translating node 'GroupReduce "MAX(1)" : SORTED_GROUP_REDUCE [[ GlobalProperties [partitioning=RANDOM] ]] [[ LocalProperties [ordering=null, grouped=null, unique=null] ]]': Could not serialize comparator into the configuration.
{color}

Grouping with: *class MyComparable implements Comparable<MyComparable>*
{color:red}Exception in thread "main" java.lang.UnsupportedOperationException: Generic type comparators are not yet implemented.
	at eu.stratosphere.api.java.typeutils.GenericTypeInfo.createComparator(GenericTypeInfo.java:66){color}

I did those test in order to understand the problem. As far as I understand:
-> Tuple data types can be grouped when they contain non generic types
-> All other generic types are not group-able. In a Tuple or not.
-> Tuples which contain one generic type are not group-able independent on the KEY used for grouping

Does it make sense to remove the Comparable restriction? Because even some classes which do fulfill that restriction are not supported?!
And Tuple can be grouped if they consist of the right types.

> Support KeySelector function returning Tuples
> ---------------------------------------------
>
>                 Key: FLINK-925
>                 URL: https://issues.apache.org/jira/browse/FLINK-925
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 0.6-incubating
>            Reporter: Fabian Hueske
>            Assignee: Tobias
>            Priority: Minor
>              Labels: starter
>
> KeySelector functions are used to extract keys on which DataSets can be grouped or joined.
> Currently, the keys types returned by KeySelector function are restricted to be comparable. However, Flinks Tuple data types are not comparable (because this depends on the types of its fields) which makes grouping and joining on composite keys difficult.
> We should change the signature of the groupBy(), join(), and coGroup() methods to allow also non-comparable keys as return types of a KeySelector function. 
> Instead we will check at optimization time whether the returned type is comparable (which is true for tuples if all elements are comparable).



--
This message was sent by Atlassian JIRA
(v6.2#6252)