You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liwei Lin (JIRA)" <ji...@apache.org> on 2017/03/13 04:44:04 UTC

[jira] [Created] (SPARK-19932) Also save event time into StateStore for certain cases

Liwei Lin created SPARK-19932:
---------------------------------

             Summary: Also save event time into StateStore for certain cases
                 Key: SPARK-19932
                 URL: https://issues.apache.org/jira/browse/SPARK-19932
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.1.0
            Reporter: Liwei Lin


<code>
spark
   .readStream                 // schema: (word, eventTime), like ("a", 10), ("a", 11), ("b", 12) ...
   ...
   .withWatermark("eventTime", "10 seconds")
   .dropDuplicates("word")     // note: "eventTime" is not part of the key columns
   ...
<code>

As shown above, right now if watermark is specified for a streaming dropDuplicates query, but not specified as the key columns, then we'll still get the correct answer, but the state just keeps growing and will never get cleaned up.

The reason is, the watermark attribute is not part of the key of the state store in this case. We're not saving event time information in the state store.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org