You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stavros Kontopoulos (JIRA)" <ji...@apache.org> on 2019/04/23 21:24:00 UTC

[jira] [Created] (SPARK-27549) Commit Kafka Source offsets to facilitate external tooling

Stavros Kontopoulos created SPARK-27549:
-------------------------------------------

             Summary: Commit Kafka Source offsets to facilitate external tooling
                 Key: SPARK-27549
                 URL: https://issues.apache.org/jira/browse/SPARK-27549
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.0.0
            Reporter: Stavros Kontopoulos


Tools monitoring consumer lag could benefit from having the option of saving the source offsets. Sources use the implementation of org.apache.spark.sql.sources.v2.reader.streaming.

SparkDataStream. KafkaMicroBatchStream currently [does not commit|https://github.com/apache/spark/blob/5bf5d9d854db53541956dedb03e2de8eecf65b81/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L170] anything as expected so we could expand that.

Other streaming engines like [Flink|https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-offset-committing-behaviour-configuration] allow you to enable `auto.commit` at the expense of not having checkpointing.

Here the proposal is to allow commit the sources offsets when progress has been made.

I am also aware that another option would be to have a StreamingQueryListener and intercept when batches are completed and then write the offsets anywhere you need to but it would be great if Kafka integration with Structured Streaming could do some of this work anyway.

[~cody@koeninger.org]  [~marmbrus] what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org