You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cédric Chantepie (Jira)" <ji...@apache.org> on 2022/03/31 13:49:00 UTC
[jira] [Created] (SPARK-38715) Would be nice to be able to configure a client ID pattern in Kafka integration
Cédric Chantepie created SPARK-38715:
----------------------------------------
Summary: Would be nice to be able to configure a client ID pattern in Kafka integration
Key: SPARK-38715
URL: https://issues.apache.org/jira/browse/SPARK-38715
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Cédric Chantepie
By default Kafka client automatically generated a unique client ID.
Client ID is used by many data lineage tool to gather consumer/producer (for consumer the consumer group is also used, but only client ID can be used for producer).
Setting the [client.id](https://kafka.apache.org/documentation/#producerconfigs_client.id) is options passed to Spark Kafka read or write is not possible, as it would force the same client.id on at east both the driver and the executor.
What could be done is to be able to passed Spark specific option, maybe named `clientIdPrefix`.
e.g.
```scala
val df = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribePattern", "topic.*")
.option("startingOffsets", "earliest")
.option("endingOffsets", "latest")
.option("clientIdPrefix", "my-workflow-")
.load()
```
Possible implement would be to update [InternalKafkaProducerPool](https://github.com/apache/spark/blob/master/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/producer/InternalKafkaProducerPool.scala#L75), or maybe in Spark `KafkaConfigUpdater` ?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org