You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/06/23 02:31:00 UTC

[jira] [Created] (SPARK-24634) Add a new metric regarding number of rows later than watermark

Jungtaek Lim created SPARK-24634:
------------------------------------

             Summary: Add a new metric regarding number of rows later than watermark
                 Key: SPARK-24634
                 URL: https://issues.apache.org/jira/browse/SPARK-24634
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.4.0
            Reporter: Jungtaek Lim


Spark filters out late rows which are later than watermark while applying operations which leverage window. While Spark exposes information regarding watermark to StreamingQueryListener, there's no information regarding rows being filtered out due to watermark. The information should help end users to adjust watermark while operating their query.

We could expose metric regarding number of rows later than watermark and being filtered out. It would be ideal to support side-output to consume late rows, but it doesn't look like easy so addressing this first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org