You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "luigi (Jira)" <ji...@apache.org> on 2021/12/16 09:02:00 UTC
[jira] [Created] (SPARK-37662) exception when handling late data with watermarking and window
luigi created SPARK-37662:
-----------------------------
Summary: exception when handling late data with watermarking and window
Key: SPARK-37662
URL: https://issues.apache.org/jira/browse/SPARK-37662
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.2.0
Environment: spark v3.2.0
scala v2.12.12
Reporter: luigi
when i use watermark to block late data, meanwhile window for state de-duplication, the order will cause unexpected behavior.
a)below code will cause exception state that {color:#172b4d}"Couldn't find {color:#de350b}timestamp#58-T5000ms{color} in [{color:#4c9aff}window#550-T5000ms{color},raid#132L,app#528]"{color}
{code:java}
// code placeholder
withWatermark("timestamp", "5 seconds").
withColumn("window", window($"timestamp", "1 hours")).
dropDuplicates("window", "raid", "app"). {code}
b) but when i switch the order of watermark and window config as below, it work without any exception
{code:java}
// code placeholder
withColumn("window", window($"timestamp", "1 hours")).
withWatermark("timestamp", "5 seconds").
dropDuplicates("window", "raid", "app"). {code}
pls. note , this issue does not exist on spark v3.1.2
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org