You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2022/07/05 07:58:00 UTC
[jira] [Comment Edited] (FLINK-26764) Support RESPECT NULLS for FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/FLINK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562047#comment-17562047 ]
Jark Wu edited comment on FLINK-26764 at 7/5/22 7:57 AM:
---------------------------------------------------------
I checked some resources[1][2][3], and it seems the default behavior of first_value is "respect nulls"[3]:
> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS.
I think we should follow SQL standards but keep compatibility. Therefore, I think we should change the default behavior to respect nulls and we can provide an option for users to switch back to the previous behavior.
Besides, [~luoyuxia] could you help to check the null behavior of LEAD and LAG in Flink SQL? We should also fix them if they ignore nulls.
Regarding the config option name, I would suggest {{table.exec.navigation-functions.null-treatment=respect_nulls/ignore_nulls}} or
{{table.exec.first-last-value.null-treatment=respect_nulls/ignore_nulls}} in case of only need to fix first_value and last_value.
[1]: https://modern-sql.com/caniuse/T617
[2]: https://modern-sql.com/caniuse/first_value
[3]: https://www.postgresql.org/docs/current/functions-window.html
was (Author: jark):
I checked some resources[1][2][3], and it seems the default behavior of first_value is "respect nulls"[3]:
> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS.
I think we should follow SQL standards but keep compatibility. Therefore, I agree with [~godfreyhe] that adding a config option to respect nulls (default ignore nulls).
Besides, [~luoyuxia] could you help to check the null behavior of LEAD and LAG in Flink SQL? We should also fix them if they ignore nulls.
Regarding the config option name, I would suggest {{table.exec.navigation-functions.null-treatment=respect_nulls/ignore_nulls}} or
{{table.exec.first-last-value.null-treatment=respect_nulls/ignore_nulls}} in case of only need to fix first_value and last_value.
[1]: https://modern-sql.com/caniuse/T617
[2]: https://modern-sql.com/caniuse/first_value
[3]: https://www.postgresql.org/docs/current/functions-window.html
> Support RESPECT NULLS for FIRST_VALUE/LAST_VALUE
> -------------------------------------------------
>
> Key: FLINK-26764
> URL: https://issues.apache.org/jira/browse/FLINK-26764
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / API, Table SQL / Planner
> Reporter: luoyuxia
> Assignee: luoyuxia
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Flink supports fucntion FIRST_VALUE/LAST_VALUE, but the behavior is always ignore null value.
> But the [Spark|https://spark.apache.org/docs/2.4.2/api/sql/index.html#first_value], [Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+windowingandanalytics], [Oracle|https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions057.htm], [Snowflake|https://docs.snowflake.com/en/sql-reference/functions/first_value.html], etc, also support to respect null for FIRST_VALUE/LAST_VALUE.
> Should we also support to allow users to specifc whether to ignore null?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)