You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2019/02/28 13:06:00 UTC
[jira] [Updated] (FLINK-7002) Partitioning broken if enum is used
in compound key specified using field expression
[ https://issues.apache.org/jira/browse/FLINK-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Metzger updated FLINK-7002:
----------------------------------
Component/s: (was: Core)
API / Type Serialization System
> Partitioning broken if enum is used in compound key specified using field expression
> ------------------------------------------------------------------------------------
>
> Key: FLINK-7002
> URL: https://issues.apache.org/jira/browse/FLINK-7002
> Project: Flink
> Issue Type: Bug
> Components: API / Type Serialization System
> Affects Versions: 1.2.0, 1.3.1
> Reporter: Sebastian Klemke
> Priority: Major
> Attachments: TestJob.java, WorkingTestJob.java, testdata.avro
>
>
> When groupBy() or keyBy() is used with multiple field expressions, at least one of them being an enum type serialized using EnumTypeInfo, partitioning seems random, resulting in incorrectly grouped/keyed output datasets/datastreams.
> The attached Flink DataSet API jobs and the test dataset detail the issue: Both jobs count (id, type) occurrences, TestJob uses field expressions to group, WorkingTestJob uses a KeySelector function.
> Expected output for both is 6 records, with frequency value 100_000 each. If you run in LocalEnvironment, results are in fact equivalent. But when run on a cluster with 5 TaskManagers, only KeySelector function with String key produces correct results whereas field expressions produce random, non-repeatable, wrong results.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)