You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/20 14:25:26 UTC

[GitHub] [spark] seglo edited a comment on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling

seglo edited a comment on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling
URL: https://github.com/apache/spark/pull/24613#issuecomment-494009293
 
 
   Hey everyone, just weighing in with my 2 cents.
   
   I can't speak for how all Kafka consumer group monitoring software works, but [`kafka-consumer-group.sh`](https://kafka.apache.org/documentation/#basic_ops_consumer_lag) and [`kafka-lag-exporter`](https://github.com/lightbend/kafka-lag-exporter) both use the `AdminClient` to obtain consumer group metadata and offsets.  It's true you could consume `__consumer_offsets` and parse this information yourself, but this is an internal topic and I assume it's not meant to be consumed by external tooling.  The AdminClient is a public-facing API that lets you get offsets and other information, such as group member metadata, and more.
   
   Ideally, if the user wants to enable this feature then they would have full control of the `group.id` used.  This would make Spark apps consistent with any other Kafka consumer app.  More importantly, it would make it consistent when monitoring consumer group lag too.  If Spark only allows for an automatically generated ID, and if that ID was generated each lifetime of the app, then this Spark/Kafka `group.id` generation concern would leak out and become a problem that must be handled in a Spark only way in the monitoring tool as well.  If the `group.id` were stable and user-chosen then the monitoring tool wouldn't need to give Spark apps any special consideration.  Is there a way for the user to optionally provide a full `group.id` per Spark Query?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org