You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2022/10/23 23:40:00 UTC

[jira] [Updated] (SPARK-40892) Loosen the requirement of window_time rule - allow multiple window_time calls

     [ https://issues.apache.org/jira/browse/SPARK-40892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim updated SPARK-40892:
---------------------------------
    Description: 
SPARK-40821 introduces a new SQL function "window_time" to extract the representative time from window (which also carries over the event time metadata as well if feasible).

SPARK-40821 followed the existing rule of time window / session window which only allows a single function call in a same projection (strictly saying, it considers the call of function as once if the function is called with same parameters).

For existing rules, the restriction makes sense since allowing this would produce cartesian product of rows (although Spark can handle it). But given that window_time only produces one value, the restriction no longer makes sense.

It would be better to unlock the functionality. Note that this means the resulting column of "window_time()" is no longer be "window_time". (Note that this is the practice most of function calls do. The rules time window and session window don't follow the practice so arguably they have a bug, but fixing the bug would bring backward incompatibility...)

  was:
SPARK-40821 introduces a new SQL function "window_time" to extract the representative time from window (which also carries over the event time metadata as well if feasible).

SPARK-40821 followed the existing rule of time window / session window which only allows a single function call in a same projection (strictly saying, it considers the call of function as once if the function is called with same parameters).

For existing rules, the restriction makes sense since allowing this would produce cartesian product of rows (although Spark can handle it). But given that window_time only produces one value, the restriction no longer makes sense.

It would be better to unlock the functionality. Note that this means the resulting column of "window_time()" is no longer be "window_time".


> Loosen the requirement of window_time rule - allow multiple window_time calls
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-40892
>                 URL: https://issues.apache.org/jira/browse/SPARK-40892
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.4.0
>            Reporter: Jungtaek Lim
>            Priority: Major
>
> SPARK-40821 introduces a new SQL function "window_time" to extract the representative time from window (which also carries over the event time metadata as well if feasible).
> SPARK-40821 followed the existing rule of time window / session window which only allows a single function call in a same projection (strictly saying, it considers the call of function as once if the function is called with same parameters).
> For existing rules, the restriction makes sense since allowing this would produce cartesian product of rows (although Spark can handle it). But given that window_time only produces one value, the restriction no longer makes sense.
> It would be better to unlock the functionality. Note that this means the resulting column of "window_time()" is no longer be "window_time". (Note that this is the practice most of function calls do. The rules time window and session window don't follow the practice so arguably they have a bug, but fixing the bug would bring backward incompatibility...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org