You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2023/02/24 05:21:00 UTC

[jira] [Created] (SPARK-42549) Allow multiple event time columns in single DataFrame

Jungtaek Lim created SPARK-42549:
------------------------------------

             Summary: Allow multiple event time columns in single DataFrame
                 Key: SPARK-42549
                 URL: https://issues.apache.org/jira/browse/SPARK-42549
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.5.0
            Reporter: Jungtaek Lim


This is a follow-up task for SPARK-42376.

I've observed that all stateful operators deals with event time column via finding the "first" event time column and picking it up. This was safe before supporting multiple stateful operators, because the only way a single DataFrame contains more than one event time column is stream-stream join, and then we didn't support adding another stateful operator. It's safer to assume there is only one event time column.

With the support of multiple stateful operators, the assumption no longer holds true. Users can add another stateful operators after stream-stream join, which breaks the assumption. Even for the case of (stream-stream join)-stream join, any side of stream-stream join can have multiple event time columns.

Since it's non-trivial to reason about the right behavior for all stateful operators, we just disallow DataFrame to have more than one event time columns in SPARK-42376. (The check will be performed in stateful operator against input DataFrame.)

This ticket tracks the effort to reason about the right behavior for all stateful operators, and support more than one event time columns in the input DataFrame for stateful operators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org