You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ofir Manor (JIRA)" <ji...@apache.org> on 2016/10/07 00:22:20 UTC
[jira] [Commented] (SPARK-17815) Report committed offsets
[ https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553662#comment-15553662 ]
Ofir Manor commented on SPARK-17815:
------------------------------------
I think this is a good idea.There is a minor confusion here, though, as setting group.id is explictly blocked as far as I understand (it is even documented...). So, it might need rephrasing.
1. I think auto-commit should be off, and the driver should manually commit kafka offsets after it successfully commits a batch to HDFS (when a batch is over), so monitoring will work. I think that should happen unconditionally, unless there is a concrete performance / overhead concerns (commiting offsets to Kafka too frequently?)
2. Regarding manually setting group.id - that would be great. If there is a concern that users might mess up (reuse the group.id by mistake), at least allow setting a prefix to it (and a way to get the actual group.id)
> Report committed offsets
> ------------------------
>
> Key: SPARK-17815
> URL: https://issues.apache.org/jira/browse/SPARK-17815
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Michael Armbrust
>
> Since we manage our own offsets, we have turned off auto-commit. However, this means that external tools are not able to report on how far behind a given streaming job is. When the user manually gives us a group.id, we should report back to it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org