You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sjoerd van Leent <sj...@alliander.com> on 2020/03/17 14:33:53 UTC

Problem with Kafka group.id

Dear reader,

I must force the group.id of Kafka, as Kafka is under ACL control, however, doing so gives me the error:

Kafka option 'group.id' is not supported as user-specified consumer groups are not used to track offsets.

This won't work, as not being able to set it, basically disqualifies using Spark within our organization. How can I force (Py)Spark to respect the group.id used?

Met vriendelijke groet,

Sjoerd van Leent
Systeem Engineer | IT AST-B&R CSC


M   +31 6 11 24 52 27
E    sjoerd.van.leent@alliander.com<ma...@alliander.com>

Alliander N.V.  .  Postbus 50, 6920 AB Duiven, Nederland  .  Locatiecode: 2PB2100  .  Utrechtseweg 68, 6812 AH Arnhem  .  KvK 09104351 Arnhem  .  www.alliander.com<http://www.alliander.com/>

De inhoud van deze e-mail, inclusief bijlagen, is persoonlijk en vertrouwelijk. Mocht dit bericht niet voor u bedoeld zijn, informeer dan per omgaande de afzender en verwijder dit bericht. Gelieve deze e-mail, inclusief eventuele bijlagen, niet te gebruiken, kopiëren of door te sturen aan derden.

Re: Problem with Kafka group.id

Posted by Gabor Somogyi <ga...@gmail.com>.

Hi Sjoerd,

We've added kafka.group.id config to Spark 3.0...

kafka.group.id string none streaming and batch The Kafka group id to use in
Kafka consumer while reading from Kafka. Use this with caution. By default,
each query generates a unique group id for reading data. This ensures that
each Kafka source has its own consumer group that does not face
interference from any other consumer, and therefore can read all of the
partitions of its subscribed topics. In some scenarios (for example, Kafka
group-based authorization), you may want to use a specific authorized group
id to read data. You can optionally set the group id. However, do this with
extreme caution as it can cause unexpected behavior. Concurrently running
queries (both, batch and streaming) or sources with the same group id are
likely interfere with each other causing each query to read only part of
the data. This may also occur when queries are started/restarted in quick
succession. To minimize such issues, set the Kafka consumer session timeout
(by setting option "kafka.session.timeout.ms") to be very small. When this
is set, option "groupIdPrefix" will be ignored.
BR,
G

On Tue, Mar 17, 2020 at 3:34 PM Sjoerd van Leent <
sjoerd.van.leent@alliander.com> wrote:

> Dear reader,
>
>
>
> I must force the group.id of Kafka, as Kafka is under ACL control,
> however, doing so gives me the error:
>
>
>
> Kafka option 'group.id' is not supported as user-specified consumer
> groups are not used to track offsets.
>
>
>
> This won’t work, as not being able to set it, basically disqualifies using
> Spark within our organization. How can I force (Py)Spark to respect the
> group.id used?
>
>
>
> Met vriendelijke groet,
>
>
>
> *Sjoerd van Leent*
>
> Systeem Engineer | IT AST-B&R CSC
>
>
>
> *M   *+31 6 11 24 52 27
> *E *   sjoerd.van.leent@alliander.com
>
>
> *Alliander N.V.  *.  Postbus 50, 6920 AB Duiven, Nederland  .
> Locatiecode: 2PB2100  .  Utrechtseweg 68, 6812 AH Arnhem  .  KvK 09104351
> Arnhem  .  *www.alliander.com <http://www.alliander.com/> *
>
>
>
> De inhoud van deze e-mail, inclusief bijlagen, is persoonlijk en
> vertrouwelijk. Mocht dit bericht niet voor u bedoeld zijn, informeer dan
> per omgaande de afzender en verwijder dit bericht. Gelieve deze e-mail,
> inclusief eventuele bijlagen, niet te gebruiken, kopiëren of door te sturen
> aan derden.
>
>
>
>
>

Re: Problem with Kafka group.id

Posted by Sjoerd van Leent <sj...@alliander.com>.

This is exactly the issue I am fighting against. Within a good number of organizations,  this is against policy. Another solution is necessary.

________________________________
From: Spico Florin <sp...@gmail.com>
Sent: Tuesday, March 24, 2020 11:23:29 AM
To: Sethupathi T <se...@googlemail.com.invalid>
Cc: Sjoerd van Leent <sj...@alliander.com>; user@spark.apache.org <us...@spark.apache.org>
Subject: Re: Problem with Kafka group.id

Hello!

Maybe you can find more information on the same issue reported here:
https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSourceProvider.html<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjaceklaskowski.gitbooks.io%2Fspark-structured-streaming%2Fspark-sql-streaming-KafkaSourceProvider.html&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240026162&sdata=A1LHtb0I51ciZUnRq28FuBBjHzO6n3xg10enDWp8EnU%3D&reserved=0>

validateGeneralOptions makes sure that group.id<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroup.id%2F&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240036159&sdata=W%2FoB2aBKcr6URVcsH38QPhMF8YBAygqvFsbe9qML62c%3D&reserved=0> has not been specified and reports an IllegalArgumentException otherwise.

+

Kafka option 'group.id<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroup.id%2F&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240036159&sdata=W%2FoB2aBKcr6URVcsH38QPhMF8YBAygqvFsbe9qML62c%3D&reserved=0>' is not supported as user-specified consumer groups are not used to track offset

https://github.com/Azure/azure-event-hubs-for-kafka/issues/35<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-event-hubs-for-kafka%2Fissues%2F35&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240046156&sdata=E8aih3f50LLrA9FVmjzr7AJoJ2%2FqAQoGMqMmqIzpt9k%3D&reserved=0>

I hope it helps, Florin

On Mon, Mar 23, 2020 at 5:45 PM Sethupathi T <se...@googlemail.com.invalid> wrote:
I had exact same issue, the temp fix what I did was, took open source code from github, modified the group.id<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroup.id%2F&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240046156&sdata=ZCrnwgb8sd9%2BGFgJR0wPToVIdFCgsoYC%2BmIR2IYIDtw%3D&reserved=0> mandatory logic and built customized library.

Thanks,

On Tue, Mar 17, 2020 at 7:34 AM Sjoerd van Leent <sj...@alliander.com>> wrote:

Dear reader,

I must force the group.id<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroup.id%2F&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240056151&sdata=0%2BEbTnDZVzZmL8795qjAYK1%2FaqwYV0kPR3KxxGRLTBs%3D&reserved=0> of Kafka, as Kafka is under ACL control, however, doing so gives me the error:

Kafka option 'group.id<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroup.id%2F&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240066146&sdata=nMX9V4IqdAhwLl5R5fHCa70z2wSSHhmCP%2FJv%2FxoHhCQ%3D&reserved=0>' is not supported as user-specified consumer groups are not used to track offsets.

This won’t work, as not being able to set it, basically disqualifies using Spark within our organization. How can I force (Py)Spark to respect the group.id<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroup.id%2F&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240066146&sdata=nMX9V4IqdAhwLl5R5fHCa70z2wSSHhmCP%2FJv%2FxoHhCQ%3D&reserved=0> used?

Met vriendelijke groet,

Sjoerd van Leent

Systeem Engineer | IT AST-B&R CSC

M   +31 6 11 24 52 27
E    sjoerd.van.leent@alliander.com<ma...@alliander.com>

Alliander N.V.  .  Postbus 50, 6920 AB Duiven, Nederland  .  Locatiecode: 2PB2100  .  Utrechtseweg 68, 6812 AH Arnhem<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.google.com%2Fmaps%2Fsearch%2FUtrechtseweg%2B68%2C%2B6812%2BAH%2BArnhem%3Fentry%3Dgmail%26source%3Dg&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240076145&sdata=aCrl%2FHJcbgkkyjnyj%2BOKRrKm3sj2s%2FJXo6hDDad2IvQ%3D&reserved=0>  .  KvK 09104351 Arnhem  .  www.alliander.com<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.alliander.com%2F&data=02%7C01%7C%7Ceddfd833591c4f16251708d7cfdd6cd6%7C697f104bd7cb48c8ac9fbd87105bafdc%7C0%7C0%7C637206422240076145&sdata=LdiVw61sn%2BRujuz2BajGrRiM9GEsexcC%2Bf5iTIhnUPA%3D&reserved=0>

De inhoud van deze e-mail, inclusief bijlagen, is persoonlijk en vertrouwelijk. Mocht dit bericht niet voor u bedoeld zijn, informeer dan per omgaande de afzender en verwijder dit bericht. Gelieve deze e-mail, inclusief eventuele bijlagen, niet te gebruiken, kopiëren of door te sturen aan derden.

Re: Problem with Kafka group.id

Posted by Spico Florin <sp...@gmail.com>.

Hello!

Maybe you can find more information on the same issue reported here:
https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSourceProvider.html


validateGeneralOptions makes sure that group.id has not been specified and
reports an IllegalArgumentException otherwise.
+

Kafka option 'group.id' is not supported as user-specified consumer
groups are not used to track offset

https://github.com/Azure/azure-event-hubs-for-kafka/issues/35

I hope it helps, Florin

On Mon, Mar 23, 2020 at 5:45 PM Sethupathi T
<se...@googlemail.com.invalid> wrote:

> I had exact same issue, the temp fix what I did was, took open source code
> from github, modified the group.id mandatory logic and built customized
> library.
>
> Thanks,
>
> On Tue, Mar 17, 2020 at 7:34 AM Sjoerd van Leent <
> sjoerd.van.leent@alliander.com> wrote:
>
>> Dear reader,
>>
>>
>>
>> I must force the group.id of Kafka, as Kafka is under ACL control,
>> however, doing so gives me the error:
>>
>>
>>
>> Kafka option 'group.id' is not supported as user-specified consumer
>> groups are not used to track offsets.
>>
>>
>>
>> This won’t work, as not being able to set it, basically disqualifies
>> using Spark within our organization. How can I force (Py)Spark to respect
>> the group.id used?
>>
>>
>>
>> Met vriendelijke groet,
>>
>>
>>
>> *Sjoerd van Leent*
>>
>> Systeem Engineer | IT AST-B&R CSC
>>
>>
>>
>> *M   *+31 6 11 24 52 27
>> *E *   sjoerd.van.leent@alliander.com
>>
>>
>> *Alliander N.V.  *.  Postbus 50, 6920 AB Duiven, Nederland  .
>> Locatiecode: 2PB2100  .  Utrechtseweg 68, 6812 AH Arnhem
>> <https://www.google.com/maps/search/Utrechtseweg+68,+6812+AH+Arnhem?entry=gmail&source=g>
>>  .  KvK 09104351 Arnhem  .  *www.alliander.com
>> <http://www.alliander.com/> *
>>
>>
>>
>> De inhoud van deze e-mail, inclusief bijlagen, is persoonlijk en
>> vertrouwelijk. Mocht dit bericht niet voor u bedoeld zijn, informeer dan
>> per omgaande de afzender en verwijder dit bericht. Gelieve deze e-mail,
>> inclusief eventuele bijlagen, niet te gebruiken, kopiëren of door te sturen
>> aan derden.
>>
>>
>>
>>
>>
>

Re: Problem with Kafka group.id

Posted by Sethupathi T <se...@googlemail.com.INVALID>.

I had exact same issue, the temp fix what I did was, took open source code
from github, modified the group.id mandatory logic and built customized
library.

Thanks,

On Tue, Mar 17, 2020 at 7:34 AM Sjoerd van Leent <
sjoerd.van.leent@alliander.com> wrote:

> Dear reader,
>
>
>
> I must force the group.id of Kafka, as Kafka is under ACL control,
> however, doing so gives me the error:
>
>
>
> Kafka option 'group.id' is not supported as user-specified consumer
> groups are not used to track offsets.
>
>
>
> This won’t work, as not being able to set it, basically disqualifies using
> Spark within our organization. How can I force (Py)Spark to respect the
> group.id used?
>
>
>
> Met vriendelijke groet,
>
>
>
> *Sjoerd van Leent*
>
> Systeem Engineer | IT AST-B&R CSC
>
>
>
> *M   *+31 6 11 24 52 27
> *E *   sjoerd.van.leent@alliander.com
>
>
> *Alliander N.V.  *.  Postbus 50, 6920 AB Duiven, Nederland  .
> Locatiecode: 2PB2100  .  Utrechtseweg 68, 6812 AH Arnhem
> <https://www.google.com/maps/search/Utrechtseweg+68,+6812+AH+Arnhem?entry=gmail&source=g>
>  .  KvK 09104351 Arnhem  .  *www.alliander.com
> <http://www.alliander.com/> *
>
>
>
> De inhoud van deze e-mail, inclusief bijlagen, is persoonlijk en
> vertrouwelijk. Mocht dit bericht niet voor u bedoeld zijn, informeer dan
> per omgaande de afzender en verwijder dit bericht. Gelieve deze e-mail,
> inclusief eventuele bijlagen, niet te gebruiken, kopiëren of door te sturen
> aan derden.
>
>
>
>
>