You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "luigi (Jira)" <ji...@apache.org> on 2021/12/16 09:02:00 UTC

[jira] [Created] (SPARK-37662) exception when handling late data with watermarking and window

luigi created SPARK-37662:
-----------------------------

             Summary: exception when handling late data with watermarking and window
                 Key: SPARK-37662
                 URL: https://issues.apache.org/jira/browse/SPARK-37662
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.2.0
         Environment: spark v3.2.0

scala v2.12.12
            Reporter: luigi


when i use watermark to block late data, meanwhile window for state de-duplication, the order will cause unexpected behavior.

a)below code will cause exception state that {color:#172b4d}"Couldn't find {color:#de350b}timestamp#58-T5000ms{color} in [{color:#4c9aff}window#550-T5000ms{color},raid#132L,app#528]"{color}
{code:java}
// code placeholder
withWatermark("timestamp", "5 seconds").
withColumn("window", window($"timestamp", "1 hours")).
dropDuplicates("window", "raid", "app"). {code}
b) but when i switch the order of watermark and window config as below, it work without any exception 
{code:java}
// code placeholder
withColumn("window", window($"timestamp", "1 hours")). 
withWatermark("timestamp", "5 seconds").
dropDuplicates("window", "raid", "app").  {code}
pls. note ,  this issue does not exist on spark v3.1.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org