You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ofir Manor (JIRA)" <ji...@apache.org> on 2016/10/07 00:22:20 UTC

[jira] [Commented] (SPARK-17815) Report committed offsets

    [ https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553662#comment-15553662 ] 

Ofir Manor commented on SPARK-17815:
------------------------------------

I think this is a good idea.There is a minor confusion here, though, as setting group.id is explictly blocked as far as I understand (it is even documented...). So, it might need rephrasing.
1. I think auto-commit should be off, and the driver should manually commit kafka offsets after it successfully commits a batch to HDFS (when a batch is over), so monitoring will work. I think that should happen unconditionally, unless there is a concrete performance / overhead concerns (commiting offsets to Kafka too frequently?)
2. Regarding manually setting group.id - that would be great. If there is a concern that users might mess up (reuse the group.id by mistake), at least allow setting a prefix to it (and a way to get the actual group.id)

> Report committed offsets
> ------------------------
>
>                 Key: SPARK-17815
>                 URL: https://issues.apache.org/jira/browse/SPARK-17815
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Since we manage our own offsets, we have turned off auto-commit.  However, this means that external tools are not able to report on how far behind a given streaming job is.  When the user manually gives us a group.id, we should report back to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org