You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2020/02/11 02:35:00 UTC

[jira] [Created] (FLINK-15979) Fix the merged count is not accurate in CountDistinctWithMerge

Jark Wu created FLINK-15979:
-------------------------------

             Summary: Fix the merged count is not accurate in CountDistinctWithMerge 
                 Key: FLINK-15979
                 URL: https://issues.apache.org/jira/browse/FLINK-15979
             Project: Flink
          Issue Type: New Feature
          Components: Table SQL / Legacy Planner
            Reporter: Jark Wu


As discussed in the user ML: https://lists.apache.org/thread.html/rc4b06c9931656c94dc993b124da3ff00f04099e41201c64788936c24%40%3Cuser.flink.apache.org%3E.

The current implementation of {{org.apache.flink.table.runtime.utils.JavaUserDefinedAggFunctions.CountDistinctWithMerge#merge}} in old planner is not correct which will have a wrong merged count. 

The test (org.apache.flink.table.runtime.stream.table.GroupWindowITCase#testEventTimeSessionGroupWindowOverTime) which uses this UDAF can't expose the bug because there are no distinct values in the test data.  

The class {{CountDistinctWithMerge}} is a testing implementation which is not a critical problem. Blink planner has a correct implementation: https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-blink/src/test/java/org/apache/flink/table/planner/plan/utils/JavaUserDefinedAggFunctions.java#L369



--
This message was sent by Atlassian Jira
(v8.3.4#803005)