You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/14 12:57:24 UTC

[GitHub] [spark] koeninger commented on issue #21038: [SPARK-22968][DStream] Throw an exception on partition revoking issue

koeninger commented on issue #21038: [SPARK-22968][DStream] Throw an exception on partition revoking issue
URL: https://github.com/apache/spark/pull/21038#issuecomment-541661690
 
 
   Read the Kafka documentation more closely. You can't have multiple
   consumers from the same group consuming the same partition.  If you have
   different consumer groups, they're going to be consuming the same records.
   Kafka parallelism is limited to the partition, and spark dstream partitions
   are 1:1 with the Kafka partitions.  If your computer per record is much
   greater than the cost of reading, you can shuffle in spark after
   consuming.  Otherwise your only real option is to repartition Kafka.
   
   "
   Our topic is divided into a set of totally ordered partitions, each of
   which is consumed by exactly one consumer within each subscribing consumer
   group at any given time. "
   
   On Mon, Oct 14, 2019, 5:42 AM Anand Changediya <no...@github.com>
   wrote:
   
   > @koeninger <https://github.com/koeninger> According to Kafka documentation
   >
   > If all the consumer instances have the same consumer group, then the
   > records will effectively be load-balanced over the consumer instances
   > This means I can have multiple consumers with same groupId which can help
   > me to load balance my application and scale accordingly.
   > I don't know why it is said "fundamentally wrong" to have multiple
   > consumers with the same groupId in spark.
   > So how can I achieve scalability to listen to a single partition and
   > increase consumption rate with multiple spark consumers?
   > Is this the spark design fault or any other way to achieve that which I am
   > unaware of?
   >
   > @SehanRathnayake <https://github.com/SehanRathnayake> Any thoughts?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/spark/pull/21038?email_source=notifications&email_token=AAAYAB54OYYJXKMEEU7UDFDQOREIXA5CNFSM4EZ57NIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBEDXVY#issuecomment-541604823>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAAYAB4LY5I3SIFUBWMVJJDQOREIXANCNFSM4EZ57NIA>
   > .
   >
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org