You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2022/07/05 07:58:00 UTC

[jira] [Comment Edited] (FLINK-26764) Support RESPECT NULLS for FIRST_VALUE/LAST_VALUE

    [ https://issues.apache.org/jira/browse/FLINK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562047#comment-17562047 ] 

Jark Wu edited comment on FLINK-26764 at 7/5/22 7:57 AM:
---------------------------------------------------------

I checked some resources[1][2][3], and it seems the default behavior of first_value is "respect nulls"[3]: 

> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS. 

I think we should follow SQL standards but keep compatibility. Therefore, I think we should change the default behavior to respect nulls and we can provide an option for users to switch back to the previous behavior.

Besides, [~luoyuxia] could you help to check the null behavior of LEAD and LAG in Flink SQL? We should also fix them if they ignore nulls. 

Regarding the config option name, I would suggest {{table.exec.navigation-functions.null-treatment=respect_nulls/ignore_nulls}} or 
{{table.exec.first-last-value.null-treatment=respect_nulls/ignore_nulls}} in case of only need to fix first_value and last_value.


[1]: https://modern-sql.com/caniuse/T617
[2]: https://modern-sql.com/caniuse/first_value
[3]: https://www.postgresql.org/docs/current/functions-window.html



was (Author: jark):
I checked some resources[1][2][3], and it seems the default behavior of first_value is "respect nulls"[3]: 

> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS. 

I think we should follow SQL standards but keep compatibility. Therefore, I agree with [~godfreyhe] that adding a config option to respect nulls (default ignore nulls).

Besides, [~luoyuxia] could you help to check the null behavior of LEAD and LAG in Flink SQL? We should also fix them if they ignore nulls. 

Regarding the config option name, I would suggest {{table.exec.navigation-functions.null-treatment=respect_nulls/ignore_nulls}} or 
{{table.exec.first-last-value.null-treatment=respect_nulls/ignore_nulls}} in case of only need to fix first_value and last_value.


[1]: https://modern-sql.com/caniuse/T617
[2]: https://modern-sql.com/caniuse/first_value
[3]: https://www.postgresql.org/docs/current/functions-window.html


> Support RESPECT  NULLS for FIRST_VALUE/LAST_VALUE
> -------------------------------------------------
>
>                 Key: FLINK-26764
>                 URL: https://issues.apache.org/jira/browse/FLINK-26764
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API, Table SQL / Planner
>            Reporter: luoyuxia
>            Assignee: luoyuxia
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>
> Flink supports fucntion FIRST_VALUE/LAST_VALUE, but the behavior is always ignore null value.
> But the [Spark|https://spark.apache.org/docs/2.4.2/api/sql/index.html#first_value], [Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+windowingandanalytics], [Oracle|https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions057.htm], [Snowflake|https://docs.snowflake.com/en/sql-reference/functions/first_value.html], etc, also support to respect null for FIRST_VALUE/LAST_VALUE.
> Should we also support to allow users to specifc whether to ignore null?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)