You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Sebastian Klemke (JIRA)" <ji...@apache.org> on 2017/06/25 20:37:00 UTC
[jira] [Created] (FLINK-7002) Partitioning broken if enum is used
in compound key specified using field expression
Sebastian Klemke created FLINK-7002:
---------------------------------------
Summary: Partitioning broken if enum is used in compound key specified using field expression
Key: FLINK-7002
URL: https://issues.apache.org/jira/browse/FLINK-7002
Project: Flink
Issue Type: Bug
Components: Core
Affects Versions: 1.3.1, 1.2.0
Reporter: Sebastian Klemke
When groupBy() or keyBy() is used with multiple field expressions, at least one of them being an enum type serialized using EnumTypeInfo, partitioning seems random, resulting in incorrectly grouped/keyed output datasets/datastreams.
The attached Flink DataSet API jobs and the test dataset detail the issue: Both jobs count (id, type) occurrences, TestJob uses field expressions to group, WorkingTestJob uses a KeySelector function.
Expected output for both is 6 records, with frequency value 100_000 each. If you run in LocalEnvironment, results are in fact equivalent. But when run on a cluster with 5 TaskManagers, only KeySelector function with String key produces correct results whereas field expressions produce random, non-repeatable, wrong results.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)