You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Georg Heiler <ge...@gmail.com> on 2022/03/24 15:08:23 UTC

Flink SQL AVG with mandatory type casting

Hi,

I observe strange behavior in Flink SQL:
For an input stream:

CREATE TABLE input_stream (
    duration int,
    rating int

) WITH (
    'connector' = 'kafka',
    'topic' = 't',
    'scan.startup.mode' = 'earliest-offset',
    'format' = 'avro-confluent',
    'avro-confluent.schema-registry.url' = 'http://localhost:8081/',
    'properties.group.id' = 'flink-test-001',
    'properties.bootstrap.servers' = 'localhost:9092'
);

The following SQL:
SELECT AVG(duration) AS  duration_mean, AVG(CAST(rating AS DOUBLE)) AS
rating_mean FROM input_stream;

returns:

duration_mean                    rating_mean
            45              2.503373819163293

I.e. duration_mean is truncated to an INT!

Any other database system I know by default outputs a DOUBLE type for any
input (including INT) and does not truncate it.

Why does Flink decide to truncate here? Why is a manual type cast necessary?

Best,
Georg

Re: Flink SQL AVG with mandatory type casting

Posted by Ingo Bürk <ai...@apache.org>.
Hi Georg,

the Flink implementations seem to be based off of SQL Server[1], which 
has similar (though better documented) behavior for integer-like data types.

[1] 
https://docs.microsoft.com/en-us/sql/t-sql/functions/avg-transact-sql?view=sql-server-ver15


Best
Ingo

On 24.03.22 16:08, Georg Heiler wrote:
> Hi,
> 
> I observe strange behavior in Flink SQL:
> For an input stream:
> 
> CREATE TABLE input_stream (
>      duration int,
>      rating int
> 
> ) WITH (
>      'connector' = 'kafka',
>      'topic' = 't',
>      'scan.startup.mode' = 'earliest-offset',
>      'format' = 'avro-confluent',
>      'avro-confluent.schema-registry.url' = 'http://localhost:8081/ 
> <http://localhost:8081/>',
>      'properties.group.id <http://properties.group.id>' = 'flink-test-001',
>      'properties.bootstrap.servers' = 'localhost:9092'
> );
> 
> The following SQL:
> SELECT AVG(duration) AS  duration_mean, AVG(CAST(rating AS DOUBLE)) AS 
> rating_mean FROM input_stream;
> 
> returns:
> 
> duration_mean                    rating_mean
>              45              2.503373819163293
> 
> I.e. duration_mean is truncated to an INT!
> 
> Any other database system I know by default outputs a DOUBLE type for 
> any input (including INT) and does not truncate it.
> 
> Why does Flink decide to truncate here? Why is a manual type cast necessary?
> 
> Best,
> Georg