You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2021/03/29 03:38:00 UTC

[jira] [Created] (SPARK-34892) Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

Jungtaek Lim created SPARK-34892:
------------------------------------

             Summary: Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
                 Key: SPARK-34892
                 URL: https://issues.apache.org/jira/browse/SPARK-34892
             Project: Spark
          Issue Type: Sub-task
          Components: Structured Streaming
    Affects Versions: 3.2.0
            Reporter: Jungtaek Lim


This issue tracks effort on introducing MergingSortWithSessionWindowStateIterator which will ensure the sort order between input rows and rows in state via efficient way. MergingSortWithSessionWindowStateIterator will require precondition that input rows are sorted, and assume that the number of rows in state per group key will be small. As the name represents, the iterator will do merge sort between twos and provide elements one by one.

The precondition will be guaranteed via physical node, and the assume is most likely true unless watermark gap is specified like hours and there're quite lots of old but not late input rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org