You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Sebastian Klemke (JIRA)" <ji...@apache.org> on 2017/06/25 20:37:00 UTC

[jira] [Created] (FLINK-7002) Partitioning broken if enum is used in compound key specified using field expression

Sebastian Klemke created FLINK-7002:
---------------------------------------

             Summary: Partitioning broken if enum is used in compound key specified using field expression
                 Key: FLINK-7002
                 URL: https://issues.apache.org/jira/browse/FLINK-7002
             Project: Flink
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.3.1, 1.2.0
            Reporter: Sebastian Klemke


When groupBy() or keyBy() is used with multiple field expressions, at least one of them being an enum type serialized using EnumTypeInfo, partitioning seems random, resulting in incorrectly grouped/keyed output datasets/datastreams.

The attached Flink DataSet API jobs and the test dataset detail the issue: Both jobs count (id, type) occurrences, TestJob uses field expressions to group, WorkingTestJob uses a KeySelector function.

Expected output for both is 6 records, with frequency value 100_000 each. If you run in LocalEnvironment, results are in fact equivalent. But when run on a cluster with 5 TaskManagers, only KeySelector function with String key produces correct results whereas field expressions produce random, non-repeatable, wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)