You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Zhengchao Shi (Jira)" <ji...@apache.org> on 2020/09/29 09:44:00 UTC

[jira] [Created] (FLINK-19452) statistics of group by CDC data is always 1

Zhengchao Shi created FLINK-19452:
-------------------------------------

             Summary: statistics of group by CDC data is always 1
                 Key: FLINK-19452
                 URL: https://issues.apache.org/jira/browse/FLINK-19452
             Project: Flink
          Issue Type: Bug
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
    Affects Versions: 1.11.1
            Reporter: Zhengchao Shi
             Fix For: 1.12.0


When using CDC to do count statistics, if only updates are made to the source table(mysql table), then the value of count is always 1.
{code:sql}
CREATE TABLE orders (
  order_number int,
  product_id   int
) with (
  'connector' = 'kafka-0.11',
  'topic' = 'Topic',
  'properties.bootstrap.servers' = 'localhost:9092',
  'properties.group.id' = 'GroupId',
  'scan.startup.mode' = 'latest-offset',
  'format' = 'canal-json'
);

CREATE TABLE order_test (
  order_number int,
  order_cnt bigint
) WITH (
  'connector' = 'print'
);

INSERT INTO order_test
SELECT order_number, count(1) FROM orders GROUP BY order_number;
{code}
3 records in  “orders” :
||order_number||product_id||
|10001|1|
|10001|2|
|10001|3|

 now update orders table:
{code:sql}
update orders set product_id = 5 where order_number = 10001;
{code}
the output of is :

-D(10001,1)
 +I(10001,1)
 -D(10001,1)
 +I(10001,1)
 -D(10001,1)
 +I(10001,1)

i think, the final result is +I(10001, 3)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)