You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Greg Hogan <co...@greghogan.com> on 2016/02/22 19:18:37 UTC

Key expressions vs case class fields

Hi,

Looking at the documentation for "Transformations on Grouped DataSet" [1],
what differentiates a key expression from case class fields? Is there a
special Scala capability or are we still just passing strings?

[1]
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#transformations-on-grouped-dataset

Thanks,
Greg

Re: Key expressions vs case class fields

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Greg,

from a user's point of view, expression keys (dataSet.groupBy("_1")) and
selector function keys (dataSet.groupBy(_._1)) are very similar in a Scala
DataSet or DataStream program. This is due to Scala's shortcut for defining
lambda functions.

However, both key types are handled differently when the program is
executed. The expression key "_1" defines the logical position of the key
in the type of the data set. The key fields are accessed by a properly
configured TypeComparator. The lambda function _._1 is a shortcut for x =>
x._1 and is treated as a regular key selector function, i.e., during plan
translation we inject a MapFunction to evaluate the selector function and
extract the key.

Does this answer your question?

Best, Fabian

2016-02-22 19:18 GMT+01:00 Greg Hogan <co...@greghogan.com>:

> Hi,
>
> Looking at the documentation for "Transformations on Grouped DataSet" [1],
> what differentiates a key expression from case class fields? Is there a
> special Scala capability or are we still just passing strings?
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#transformations-on-grouped-dataset
>
> Thanks,
> Greg
>