You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chris Horn (JIRA)" <ji...@apache.org> on 2018/06/29 22:37:00 UTC

[jira] [Created] (SPARK-24699) Watermark / Append mode should work with Trigger.Once

Chris Horn created SPARK-24699:
----------------------------------

             Summary: Watermark / Append mode should work with Trigger.Once
                 Key: SPARK-24699
                 URL: https://issues.apache.org/jira/browse/SPARK-24699
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.3.1
            Reporter: Chris Horn


I have a use case where I would like to trigger a structured streaming job from an external scheduler (once every 15 minutes or so) and have it write window aggregates to Kafka.

I am able to get my code to work when running with `Trigger.ProcessingTime` but when I switch to `Trigger.Once` the watermarking feature of structured streams does not persist to (or is not recollected from) the checkpoint state.

This causes the stream to never generate output because the watermark is perpetually stuck at `1970-01-01T00:00:00Z`.

I have created a failing test case in the `EventTimeWatermarkSuite`, I will create a [WIP] pull request on github and link it here.

 

It seems that even if it generated the watermark, and given the current streaming behavior, I would have to trigger the job twice to generate any output. 

It seems like the microbatcher only calculates the watermark off of the previous batch's input and emits new aggs based off of that timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org