You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Balasubramanian Jayaraman (Contingent)" <ba...@autodesk.com> on 2014/03/05 11:13:18 UTC

Reg Partition

Hi

I have a doubt on the parallelism. Why the number of parallel consumer consuming messages from a topic is restricted on the number of partitions configured for a topic?
Why should this be the case. Why should the partition affect the number of parallel consumers?

Thanks
Bala

Re: Reg Partition

Posted by David Birdsong <da...@gmail.com>.
On Sun, Mar 9, 2014 at 6:09 PM, Balasubramanian Jayaraman (Contingent) <
balasubramanian.jayaraman@autodesk.com> wrote:

> Thanks Martin. We are still in the design phase. I wanted to clarify my
> doubt on the relation between parallelism and partitions.
>

kafka is a distributed, ordered commit log. there are underlying resources
that are consumed by kafka--in most cases a disk spindle.

the partition is just an abstraction of that underlying resource.
administrators need to know about so they can deploy kafka and monitor it
correctly and developers need to know about to take full advantage of those
resources.


-----Original Message-----
> From: Martin Kleppmann [mailto:mkleppmann@linkedin.com]
> Sent: Thursday, March 06, 2014 8:29 PM
> To: <us...@kafka.apache.org>
> Subject: Re: Reg Partition
>
> You can certainly have several consumers consuming from the same
> partition: just give each a different consumer group ID, and then all the
> messages from the partition will be delivered to all of the consumers.
>
> If you want each message to only be processed by one of the consumers, you
> can drop those that you don't want: for example, consumer 1 ignores all
> messages with an even-numbered offset, and consumer 2 ignores all messages
> with an odd-numbered offset.
>
> However, I don't understand why you want to have multiple consumers on the
> same partition in the first place. Can't you simply configure your topic to
> have enough partitions that you can achieve the required parallelism?
> That's what partitions are for.
>
> Martin
>
> On 6 Mar 2014, at 01:19, Balasubramanian Jayaraman (Contingent) <
> balasubramanian.jayaraman@autodesk.com> wrote:
> > Thanks Martin.
> > I got it. The design is considered for Performance improvement. Will
> there not be any harm if I have some consumers consuming from the same
> partitions, if I can tolerate slowness/performance degradation?
> >
> > Regards
> > Bala
> >
> > -----Original Message-----
> > From: Martin Kleppmann [mailto:mkleppmann@linkedin.com]
> > Sent: Wednesday, March 05, 2014 7:52 PM
> > To: <us...@kafka.apache.org>
> > Subject: Re: Reg Partition
> >
> > Hi Bala,
> >
> > The way Kafka works, each partition is a sequence of messages in the
> order that they were produced, and each message has a position (offset) in
> this sequence. Kafka brokers don't keep track of which consumer has seen
> which messages. Instead, each consumer keeps track of the latest offset it
> has seen: because they are consumed in sequential order, all messages with
> a smaller offset have been consumed, and all messages with a greater offset
> have not yet been consumed. Explained in detail here:
> http://kafka.apache.org/documentation.html#theconsumer
> >
> > If you wanted to have several consumers consume from the same partition,
> they would have to keep communicating in order to know which one has
> processed which messages (otherwise they'd end up processing the same
> message twice). This would be extremely inefficient.
> >
> > It's much easier and much more performant to assign each partition to
> only one consumer, so each consumer only needs to keep track of its own
> partition offsets. A consequence of that design is that you cannot have
> more consumers than partitions.
> >
> > Martin
> >
> > On 5 Mar 2014, at 10:13, Balasubramanian Jayaraman (Contingent) <
> balasubramanian.jayaraman@autodesk.com> wrote:
> >
> >> Hi
> >>
> >> I have a doubt on the parallelism. Why the number of parallel consumer
> consuming messages from a topic is restricted on the number of partitions
> configured for a topic?
> >> Why should this be the case. Why should the partition affect the number
> of parallel consumers?
> >>
> >> Thanks
> >> Bala
> >
>
>

RE: Reg Partition

Posted by "Balasubramanian Jayaraman (Contingent)" <ba...@autodesk.com>.
Thanks Martin. We are still in the design phase. I wanted to clarify my doubt on the relation between parallelism and partitions.

-----Original Message-----
From: Martin Kleppmann [mailto:mkleppmann@linkedin.com] 
Sent: Thursday, March 06, 2014 8:29 PM
To: <us...@kafka.apache.org>
Subject: Re: Reg Partition

You can certainly have several consumers consuming from the same partition: just give each a different consumer group ID, and then all the messages from the partition will be delivered to all of the consumers.

If you want each message to only be processed by one of the consumers, you can drop those that you don't want: for example, consumer 1 ignores all messages with an even-numbered offset, and consumer 2 ignores all messages with an odd-numbered offset.

However, I don't understand why you want to have multiple consumers on the same partition in the first place. Can't you simply configure your topic to have enough partitions that you can achieve the required parallelism? That's what partitions are for.

Martin

On 6 Mar 2014, at 01:19, Balasubramanian Jayaraman (Contingent) <ba...@autodesk.com> wrote:
> Thanks Martin.
> I got it. The design is considered for Performance improvement. Will there not be any harm if I have some consumers consuming from the same partitions, if I can tolerate slowness/performance degradation?
> 
> Regards
> Bala
> 
> -----Original Message-----
> From: Martin Kleppmann [mailto:mkleppmann@linkedin.com] 
> Sent: Wednesday, March 05, 2014 7:52 PM
> To: <us...@kafka.apache.org>
> Subject: Re: Reg Partition
> 
> Hi Bala,
> 
> The way Kafka works, each partition is a sequence of messages in the order that they were produced, and each message has a position (offset) in this sequence. Kafka brokers don't keep track of which consumer has seen which messages. Instead, each consumer keeps track of the latest offset it has seen: because they are consumed in sequential order, all messages with a smaller offset have been consumed, and all messages with a greater offset have not yet been consumed. Explained in detail here: http://kafka.apache.org/documentation.html#theconsumer
> 
> If you wanted to have several consumers consume from the same partition, they would have to keep communicating in order to know which one has processed which messages (otherwise they'd end up processing the same message twice). This would be extremely inefficient.
> 
> It's much easier and much more performant to assign each partition to only one consumer, so each consumer only needs to keep track of its own partition offsets. A consequence of that design is that you cannot have more consumers than partitions.
> 
> Martin
> 
> On 5 Mar 2014, at 10:13, Balasubramanian Jayaraman (Contingent) <ba...@autodesk.com> wrote:
> 
>> Hi
>> 
>> I have a doubt on the parallelism. Why the number of parallel consumer consuming messages from a topic is restricted on the number of partitions configured for a topic?
>> Why should this be the case. Why should the partition affect the number of parallel consumers?
>> 
>> Thanks
>> Bala
> 


Re: Reg Partition

Posted by Martin Kleppmann <mk...@linkedin.com>.
You can certainly have several consumers consuming from the same partition: just give each a different consumer group ID, and then all the messages from the partition will be delivered to all of the consumers.

If you want each message to only be processed by one of the consumers, you can drop those that you don't want: for example, consumer 1 ignores all messages with an even-numbered offset, and consumer 2 ignores all messages with an odd-numbered offset.

However, I don't understand why you want to have multiple consumers on the same partition in the first place. Can't you simply configure your topic to have enough partitions that you can achieve the required parallelism? That's what partitions are for.

Martin

On 6 Mar 2014, at 01:19, Balasubramanian Jayaraman (Contingent) <ba...@autodesk.com> wrote:
> Thanks Martin.
> I got it. The design is considered for Performance improvement. Will there not be any harm if I have some consumers consuming from the same partitions, if I can tolerate slowness/performance degradation?
> 
> Regards
> Bala
> 
> -----Original Message-----
> From: Martin Kleppmann [mailto:mkleppmann@linkedin.com] 
> Sent: Wednesday, March 05, 2014 7:52 PM
> To: <us...@kafka.apache.org>
> Subject: Re: Reg Partition
> 
> Hi Bala,
> 
> The way Kafka works, each partition is a sequence of messages in the order that they were produced, and each message has a position (offset) in this sequence. Kafka brokers don't keep track of which consumer has seen which messages. Instead, each consumer keeps track of the latest offset it has seen: because they are consumed in sequential order, all messages with a smaller offset have been consumed, and all messages with a greater offset have not yet been consumed. Explained in detail here: http://kafka.apache.org/documentation.html#theconsumer
> 
> If you wanted to have several consumers consume from the same partition, they would have to keep communicating in order to know which one has processed which messages (otherwise they'd end up processing the same message twice). This would be extremely inefficient.
> 
> It's much easier and much more performant to assign each partition to only one consumer, so each consumer only needs to keep track of its own partition offsets. A consequence of that design is that you cannot have more consumers than partitions.
> 
> Martin
> 
> On 5 Mar 2014, at 10:13, Balasubramanian Jayaraman (Contingent) <ba...@autodesk.com> wrote:
> 
>> Hi
>> 
>> I have a doubt on the parallelism. Why the number of parallel consumer consuming messages from a topic is restricted on the number of partitions configured for a topic?
>> Why should this be the case. Why should the partition affect the number of parallel consumers?
>> 
>> Thanks
>> Bala
> 


RE: Reg Partition

Posted by "Balasubramanian Jayaraman (Contingent)" <ba...@autodesk.com>.
Thanks Martin.
I got it. The design is considered for Performance improvement. Will there not be any harm if I have some consumers consuming from the same partitions, if I can tolerate slowness/performance degradation?

Regards
Bala

-----Original Message-----
From: Martin Kleppmann [mailto:mkleppmann@linkedin.com] 
Sent: Wednesday, March 05, 2014 7:52 PM
To: <us...@kafka.apache.org>
Subject: Re: Reg Partition

Hi Bala,

The way Kafka works, each partition is a sequence of messages in the order that they were produced, and each message has a position (offset) in this sequence. Kafka brokers don't keep track of which consumer has seen which messages. Instead, each consumer keeps track of the latest offset it has seen: because they are consumed in sequential order, all messages with a smaller offset have been consumed, and all messages with a greater offset have not yet been consumed. Explained in detail here: http://kafka.apache.org/documentation.html#theconsumer

If you wanted to have several consumers consume from the same partition, they would have to keep communicating in order to know which one has processed which messages (otherwise they'd end up processing the same message twice). This would be extremely inefficient.

It's much easier and much more performant to assign each partition to only one consumer, so each consumer only needs to keep track of its own partition offsets. A consequence of that design is that you cannot have more consumers than partitions.

Martin

On 5 Mar 2014, at 10:13, Balasubramanian Jayaraman (Contingent) <ba...@autodesk.com> wrote:

> Hi
> 
> I have a doubt on the parallelism. Why the number of parallel consumer consuming messages from a topic is restricted on the number of partitions configured for a topic?
> Why should this be the case. Why should the partition affect the number of parallel consumers?
> 
> Thanks
> Bala


Re: Reg Partition

Posted by Martin Kleppmann <mk...@linkedin.com>.
Hi Bala,

The way Kafka works, each partition is a sequence of messages in the order that they were produced, and each message has a position (offset) in this sequence. Kafka brokers don't keep track of which consumer has seen which messages. Instead, each consumer keeps track of the latest offset it has seen: because they are consumed in sequential order, all messages with a smaller offset have been consumed, and all messages with a greater offset have not yet been consumed. Explained in detail here: http://kafka.apache.org/documentation.html#theconsumer

If you wanted to have several consumers consume from the same partition, they would have to keep communicating in order to know which one has processed which messages (otherwise they'd end up processing the same message twice). This would be extremely inefficient.

It's much easier and much more performant to assign each partition to only one consumer, so each consumer only needs to keep track of its own partition offsets. A consequence of that design is that you cannot have more consumers than partitions.

Martin

On 5 Mar 2014, at 10:13, Balasubramanian Jayaraman (Contingent) <ba...@autodesk.com> wrote:

> Hi
> 
> I have a doubt on the parallelism. Why the number of parallel consumer consuming messages from a topic is restricted on the number of partitions configured for a topic?
> Why should this be the case. Why should the partition affect the number of parallel consumers?
> 
> Thanks
> Bala