You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2021/07/14 09:49:00 UTC

[jira] [Resolved] (SPARK-34892) Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

     [ https://issues.apache.org/jira/browse/SPARK-34892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim resolved SPARK-34892.
----------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

Issue resolved by pull request 33077
[https://github.com/apache/spark/pull/33077]

> Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34892
>                 URL: https://issues.apache.org/jira/browse/SPARK-34892
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>             Fix For: 3.2.0
>
>
> This issue tracks effort on introducing MergingSortWithSessionWindowStateIterator which will ensure the sort order between input rows and rows in state via efficient way. MergingSortWithSessionWindowStateIterator will require precondition that input rows are sorted, and assume that the number of rows in state per group key will be small. As the name represents, the iterator will do merge sort between twos and provide elements one by one.
> The precondition will be guaranteed via physical node, and the assume is most likely true unless watermark gap is specified like hours and there're quite lots of old but not late input rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org