You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/09/20 19:59:00 UTC

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

    [ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622623#comment-16622623 ] 

Jungtaek Lim edited comment on SPARK-10816 at 9/20/18 7:58 PM:
---------------------------------------------------------------

I'm aware of the needs for more advanced cases (like dynamic gap session window), but for simple case of session window we still have a chance to make it pretty simple. For DSL we may want to provide advanced (complicated) cases, but for SQL why not support basic case which can be expressed as SQL statement?

map/flatMapGroupsWithState is something reserved for experienced users: end users have to understand how typed API works, and the limitation of defining watermark in metadata of column (when row is serialized to object the information is lost. there's relevant issue in Spark JIRA but we just identify it as limitation and end users are dealing with it), how to craft state function correctly.

Moreover, as I mentioned in doc, map/flapMapGroupsWithState don't handle multiple sessions in same key which is even not enough to handle fixed gap of session window. Event time and watermark would require us to deal with arbitrary changes of sessions, like multiple sessions which are not yet target of eviction, as well as multiple sessions being merged into one due to late event. Current mechanism of map/flapMapGroupsWithState don't handle this, and at least require end users to deal with it at their own hands.


was (Author: kabhwan):
I'm aware of the needs for more advanced cases (like dynamic gap session window), but for simple case of session window we still have a chance to make it pretty simple. For DSL we may want to provide advanced (complicated) cases, but for SQL why not support basic case which can be expressed as SQL statement?

map/flatMapGroupsWithState is something reserved for experts: end users have to understand how typed API works, and the limitation of defining watermark in metadata of column (when row is serialized to object the information is lost. there's relevant issue in Spark JIRA but we just identify it as limitation and end users are dealing with it), how to craft state function correctly.

Moreover, as I mentioned in doc, map/flapMapGroupsWithState don't handle multiple sessions in same key which is even not enough to handle fixed gap of session window. Event time and watermark would require us to deal with arbitrary changes of sessions, like multiple sessions which are not yet target of eviction, as well as multiple sessions being merged into one due to late event. Current mechanism of map/flapMapGroupsWithState don't handle this, and at least require end users to deal with it at their own hands.

> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org