You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/03/23 18:18:00 UTC

[jira] [Commented] (SPARK-34297) Add metrics for data loss and offset out range for KafkaMicroBatchStream

    [ https://issues.apache.org/jira/browse/SPARK-34297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307321#comment-17307321 ] 

Apache Spark commented on SPARK-34297:
--------------------------------------

User 'yijiacui-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/31944

> Add metrics for data loss and offset out range for KafkaMicroBatchStream
> ------------------------------------------------------------------------
>
>                 Key: SPARK-34297
>                 URL: https://issues.apache.org/jira/browse/SPARK-34297
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL, Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>
> When testing SS, I found it is hard to track data loss of SS reading from Kafka. The micro scan node has only one metric, number of output rows. Users have no idea how many times offsets to fetch are out of Kafak now, how many times data loss happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org