You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Georg Heiler <ge...@gmail.com> on 2022/03/24 15:08:23 UTC
Flink SQL AVG with mandatory type casting
Hi,
I observe strange behavior in Flink SQL:
For an input stream:
CREATE TABLE input_stream (
duration int,
rating int
) WITH (
'connector' = 'kafka',
'topic' = 't',
'scan.startup.mode' = 'earliest-offset',
'format' = 'avro-confluent',
'avro-confluent.schema-registry.url' = 'http://localhost:8081/',
'properties.group.id' = 'flink-test-001',
'properties.bootstrap.servers' = 'localhost:9092'
);
The following SQL:
SELECT AVG(duration) AS duration_mean, AVG(CAST(rating AS DOUBLE)) AS
rating_mean FROM input_stream;
returns:
duration_mean rating_mean
45 2.503373819163293
I.e. duration_mean is truncated to an INT!
Any other database system I know by default outputs a DOUBLE type for any
input (including INT) and does not truncate it.
Why does Flink decide to truncate here? Why is a manual type cast necessary?
Best,
Georg
Re: Flink SQL AVG with mandatory type casting
Posted by Ingo Bürk <ai...@apache.org>.
Hi Georg,
the Flink implementations seem to be based off of SQL Server[1], which
has similar (though better documented) behavior for integer-like data types.
[1]
https://docs.microsoft.com/en-us/sql/t-sql/functions/avg-transact-sql?view=sql-server-ver15
Best
Ingo
On 24.03.22 16:08, Georg Heiler wrote:
> Hi,
>
> I observe strange behavior in Flink SQL:
> For an input stream:
>
> CREATE TABLE input_stream (
> duration int,
> rating int
>
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 't',
> 'scan.startup.mode' = 'earliest-offset',
> 'format' = 'avro-confluent',
> 'avro-confluent.schema-registry.url' = 'http://localhost:8081/
> <http://localhost:8081/>',
> 'properties.group.id <http://properties.group.id>' = 'flink-test-001',
> 'properties.bootstrap.servers' = 'localhost:9092'
> );
>
> The following SQL:
> SELECT AVG(duration) AS duration_mean, AVG(CAST(rating AS DOUBLE)) AS
> rating_mean FROM input_stream;
>
> returns:
>
> duration_mean rating_mean
> 45 2.503373819163293
>
> I.e. duration_mean is truncated to an INT!
>
> Any other database system I know by default outputs a DOUBLE type for
> any input (including INT) and does not truncate it.
>
> Why does Flink decide to truncate here? Why is a manual type cast necessary?
>
> Best,
> Georg