You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Somogyi (JIRA)" <ji...@apache.org> on 2019/04/25 11:55:00 UTC

[jira] [Commented] (SPARK-27549) Commit Kafka Source offsets to facilitate external tooling

    [ https://issues.apache.org/jira/browse/SPARK-27549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825994#comment-16825994 ] 

Gabor Somogyi commented on SPARK-27549:
---------------------------------------

Do you mean commit offsets all the time or would like to differentiate like Flink (cp enabled/disabled)?
Presume you would like to consider committed offset like Flink do:
{quote}The committed offsets are only a means to expose the consumer’s progress for monitoring purposes.{quote}


> Commit Kafka Source offsets to facilitate external tooling
> ----------------------------------------------------------
>
>                 Key: SPARK-27549
>                 URL: https://issues.apache.org/jira/browse/SPARK-27549
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Stavros Kontopoulos
>            Priority: Major
>
> Tools monitoring consumer lag could benefit from having the option of saving the source offsets. Sources use the implementation of org.apache.spark.sql.sources.v2.reader.streaming.
> SparkDataStream. KafkaMicroBatchStream currently [does not commit|https://github.com/apache/spark/blob/5bf5d9d854db53541956dedb03e2de8eecf65b81/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L170] anything as expected so we could expand that.
> Other streaming engines like [Flink|https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-offset-committing-behaviour-configuration] allow you to enable `auto.commit` at the expense of not having checkpointing.
> Here the proposal is to allow commit the sources offsets when progress has been made.
> I am also aware that another option would be to have a StreamingQueryListener and intercept when batches are completed and then write the offsets anywhere you need to but it would be great if Kafka integration with Structured Streaming could do some of this work anyway.
> [~cody@koeninger.org]  [~marmbrus] what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org