You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2021/03/29 03:38:00 UTC
[jira] [Created] (SPARK-34892) Introduce
MergingSortWithSessionWindowStateIterator sorting input rows and rows in
state efficiently
Jungtaek Lim created SPARK-34892:
------------------------------------
Summary: Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
Key: SPARK-34892
URL: https://issues.apache.org/jira/browse/SPARK-34892
Project: Spark
Issue Type: Sub-task
Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: Jungtaek Lim
This issue tracks effort on introducing MergingSortWithSessionWindowStateIterator which will ensure the sort order between input rows and rows in state via efficient way. MergingSortWithSessionWindowStateIterator will require precondition that input rows are sorted, and assume that the number of rows in state per group key will be small. As the name represents, the iterator will do merge sort between twos and provide elements one by one.
The precondition will be guaranteed via physical node, and the assume is most likely true unless watermark gap is specified like hours and there're quite lots of old but not late input rows.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org