You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Reza Aliakbari <ra...@gmail.com> on 2015/09/10 16:54:39 UTC

Partition Consumer(s)

Hi Everybody,

I have 2 question regarding the way consumers, consume messages of a
partition.


   - * Is it possible to configure Kafka to allow concurrent message
   consumption from one partition concurrently? The order is not my concern at
   all.*

           I couldn't find any way to that by the Group Of Consumer
approach, If it is possible please let me know, If impossible, then let me
know how to address this problem:
           For a reason a consumer that is assigned to a partition could
get very slow, and the messages would be processed very slowly. How can I
detect this and stop producing on this slow partition...




   - * Suppose I have 5 partitions and 3 consumers and I am using Group of
   Consumers model(I had 5 consumers at start but 2 servers crashed), 3
   consumers are working busy with their 3   partitions and they never get
   finished since the producer produce to their partitions **non-stop and a
   little faster than their consumption. What happens to the other 2
   partitions that are missing consumers? How the Group of Consumers can
   handle this issue?*


*The order is no matter for me, I need a simple configuration that address
my concurrency needs and I need to make sure no message gets into
starvation scenario that never consumed.*

Please let us know, we want to select between Kafka and RabitMQ and we
prefer Kafka because it is growing community and high throughput, But first
we need to address these basic needs.


Thanks,

Reza Aliakabri

Re: Partition Consumer(s)

Posted by "Helleren, Erik" <Er...@cmegroup.com>.

Kafka can only do so much.  Kafka’s High level consumer API does guarantee
delivery at least once to a living consumer’s message consumption
function.  Kafka can't guarantee that the business logic that handles that
message won’t hang or do things to circumvent it’s guarantees.

But, since we are talking business logic, there could be some sort of
timeout semantic that gives up on that individual message after a while.
This is what some do with queue architectures to let that hung thread get
back to doing work, if they can lose bad messages in that type of
situation.  At that point the app can re-enque the message based on
business needs, or it can drop it.  There would be either a lost message
or a delay in delivery for a single message in favor of availability and
resource utilization.

But, if this app requires blocking network writes over the internet per
message, RabbitMQ or ApacheMQ might just be better choices.   But even
then, if the logic is subject to hanging, there will still need to code
for that case.  
-Erik

On 9/11/15, 2:15 AM, "Reza Aliakbari" <ra...@gmail.com> wrote:

>This is not good solution to monitor and kill the bad consumer,  if my
>consumer can't manage my partition well even when there are idle threads
>then I have a bad design.
>
>I can't design a system that in some situations doesn't deliver thousands
>of emails because one thread couldn't manage things well(even when I have
>enough number of partitions)
>
>So I understand that Kafka doesn't provide concurrently in the form that
>rabbitmq provides.
>
>I just can't understand why should any message delayed when I have enough
>machines and threads idle.
>
>
>
>On Thursday, September  2015, Helleren, Erik <Er...@cmegroup.com>
>wrote:
>
>> So, the general scalability approach with kafka is to add more
>>partitions
>> to scale.  If you are using consumer groups and the High Level Consumer
>> API, redistribution of partitions is automatic on a failover of a member
>> of a consumer group.   But, the High level consumer doesn¹t allow a
>> configuration to break up partitions as is noted here:
>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>> There isn¹t really any way for multiple separate clients on separate
>>JVM's
>> to coordinate their consumption off of a single partition efficiently.
>>So
>> the solution is simply to break up a topic into enough partitions so
>>that
>> a single partition is a reasonable unit to scale a consumer by.  If a
>> consumer can only handle a single partition or worse, is falling behind,
>> your partitions are too large and need to be adjusted.
>>
>> And if for some reason a process hangs on a partition, kill it and start
>> up a new one. Provided partitions are a reasonable unit of scale, it
>> shouldn¹t be a problem.  There will be a latency spike, but that¹s
>>better
>> than starvation. You can split processing of a single partition pretty
>> easily within a JVM.  The kafka consuming runnable can just put messages
>> into a concurrent queue of some sort, and then have a large thread pool
>> pulling from that queue to do the processing.  That way if a thread in
>>the
>> pool gets hung, there are many left to consume off the queue so nothing
>> gets hung up.  But this adds some risk on failover based on how kafka
>>does
>> offset management for the high level consumer.
>>
>> So, I don¹t think that sending backoff messages to a producer to let up
>>on
>> a partition is a good design pattern for kafka. Again, the solution is
>> more partitions.  But offset data is stored in either kafka or zookeeper
>> depending on your configuration, which can tell you how many messages
>>your
>> consumer is behind by.  But, since messages being published should be
>> evenly distributed across all partitions for a topic, all partitions
>> should be lagging equally.
>>
>> If you need a true unified queue RabitMQ might be right for your needs.
>> But if order doesn¹t matter at all, kafka should give you more
>>throughput
>> with enough partitions.  And since order doesn¹t matter, you have a lot
>>of
>> flexibility here.
>>
>> Also, another option to doing everything in a native java client is to
>>use
>> a Spark application.  It makes faning out your data very easy, and has
>> some semantics that make it well suited for some of these concerns.
>>
>>
>>
>> On 9/10/15, 9:54 AM, "Reza Aliakbari" <raliakbari@gmail.com
>><javascript:;>>
>> wrote:
>>
>> >Hi Everybody,
>> >
>> >I have 2 question regarding the way consumers, consume messages of a
>> >partition.
>> >
>> >
>> >   - * Is it possible to configure Kafka to allow concurrent message
>> >   consumption from one partition concurrently? The order is not my
>> >concern at
>> >   all.*
>> >
>> >           I couldn't find any way to that by the Group Of Consumer
>> >approach, If it is possible please let me know, If impossible, then
>>let me
>> >know how to address this problem:
>> >           For a reason a consumer that is assigned to a partition
>>could
>> >get very slow, and the messages would be processed very slowly. How
>>can I
>> >detect this and stop producing on this slow partition...
>> >
>> >
>> >
>> >
>> >   - * Suppose I have 5 partitions and 3 consumers and I am using
>>Group of
>> >   Consumers model(I had 5 consumers at start but 2 servers crashed), 3
>> >   consumers are working busy with their 3   partitions and they never
>>get
>> >   finished since the producer produce to their partitions **non-stop
>>and
>> >a
>> >   little faster than their consumption. What happens to the other 2
>> >   partitions that are missing consumers? How the Group of Consumers
>>can
>> >   handle this issue?*
>> >
>> >
>> >*The order is no matter for me, I need a simple configuration that
>>address
>> >my concurrency needs and I need to make sure no message gets into
>> >starvation scenario that never consumed.*
>> >
>> >Please let us know, we want to select between Kafka and RabitMQ and we
>> >prefer Kafka because it is growing community and high throughput, But
>> >first
>> >we need to address these basic needs.
>> >
>> >
>> >Thanks,
>> >
>> >Reza Aliakabri
>>
>>

Re: Partition Consumer(s)

Posted by Reza Aliakbari <ra...@gmail.com>.

This is not good solution to monitor and kill the bad consumer,  if my
consumer can't manage my partition well even when there are idle threads
then I have a bad design.

I can't design a system that in some situations doesn't deliver thousands
of emails because one thread couldn't manage things well(even when I have
enough number of partitions)

So I understand that Kafka doesn't provide concurrently in the form that
rabbitmq provides.

I just can't understand why should any message delayed when I have enough
machines and threads idle.



On Thursday, September  2015, Helleren, Erik <Er...@cmegroup.com>
wrote:

> So, the general scalability approach with kafka is to add more partitions
> to scale.  If you are using consumer groups and the High Level Consumer
> API, redistribution of partitions is automatic on a failover of a member
> of a consumer group.   But, the High level consumer doesn¹t allow a
> configuration to break up partitions as is noted here:
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> There isn¹t really any way for multiple separate clients on separate JVM's
> to coordinate their consumption off of a single partition efficiently. So
> the solution is simply to break up a topic into enough partitions so that
> a single partition is a reasonable unit to scale a consumer by.  If a
> consumer can only handle a single partition or worse, is falling behind,
> your partitions are too large and need to be adjusted.
>
> And if for some reason a process hangs on a partition, kill it and start
> up a new one. Provided partitions are a reasonable unit of scale, it
> shouldn¹t be a problem.  There will be a latency spike, but that¹s better
> than starvation. You can split processing of a single partition pretty
> easily within a JVM.  The kafka consuming runnable can just put messages
> into a concurrent queue of some sort, and then have a large thread pool
> pulling from that queue to do the processing.  That way if a thread in the
> pool gets hung, there are many left to consume off the queue so nothing
> gets hung up.  But this adds some risk on failover based on how kafka does
> offset management for the high level consumer.
>
> So, I don¹t think that sending backoff messages to a producer to let up on
> a partition is a good design pattern for kafka. Again, the solution is
> more partitions.  But offset data is stored in either kafka or zookeeper
> depending on your configuration, which can tell you how many messages your
> consumer is behind by.  But, since messages being published should be
> evenly distributed across all partitions for a topic, all partitions
> should be lagging equally.
>
> If you need a true unified queue RabitMQ might be right for your needs.
> But if order doesn¹t matter at all, kafka should give you more throughput
> with enough partitions.  And since order doesn¹t matter, you have a lot of
> flexibility here.
>
> Also, another option to doing everything in a native java client is to use
> a Spark application.  It makes faning out your data very easy, and has
> some semantics that make it well suited for some of these concerns.
>
>
>
> On 9/10/15, 9:54 AM, "Reza Aliakbari" <raliakbari@gmail.com <javascript:;>>
> wrote:
>
> >Hi Everybody,
> >
> >I have 2 question regarding the way consumers, consume messages of a
> >partition.
> >
> >
> >   - * Is it possible to configure Kafka to allow concurrent message
> >   consumption from one partition concurrently? The order is not my
> >concern at
> >   all.*
> >
> >           I couldn't find any way to that by the Group Of Consumer
> >approach, If it is possible please let me know, If impossible, then let me
> >know how to address this problem:
> >           For a reason a consumer that is assigned to a partition could
> >get very slow, and the messages would be processed very slowly. How can I
> >detect this and stop producing on this slow partition...
> >
> >
> >
> >
> >   - * Suppose I have 5 partitions and 3 consumers and I am using Group of
> >   Consumers model(I had 5 consumers at start but 2 servers crashed), 3
> >   consumers are working busy with their 3   partitions and they never get
> >   finished since the producer produce to their partitions **non-stop and
> >a
> >   little faster than their consumption. What happens to the other 2
> >   partitions that are missing consumers? How the Group of Consumers can
> >   handle this issue?*
> >
> >
> >*The order is no matter for me, I need a simple configuration that address
> >my concurrency needs and I need to make sure no message gets into
> >starvation scenario that never consumed.*
> >
> >Please let us know, we want to select between Kafka and RabitMQ and we
> >prefer Kafka because it is growing community and high throughput, But
> >first
> >we need to address these basic needs.
> >
> >
> >Thanks,
> >
> >Reza Aliakabri
>
>

Re: Partition Consumer(s)

Posted by "Helleren, Erik" <Er...@cmegroup.com>.

So, the general scalability approach with kafka is to add more partitions
to scale.  If you are using consumer groups and the High Level Consumer
API, redistribution of partitions is automatic on a failover of a member
of a consumer group.   But, the High level consumer doesn¹t allow a
configuration to break up partitions as is noted here:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
There isn¹t really any way for multiple separate clients on separate JVM's
to coordinate their consumption off of a single partition efficiently. So
the solution is simply to break up a topic into enough partitions so that
a single partition is a reasonable unit to scale a consumer by.  If a
consumer can only handle a single partition or worse, is falling behind,
your partitions are too large and need to be adjusted.

And if for some reason a process hangs on a partition, kill it and start
up a new one. Provided partitions are a reasonable unit of scale, it
shouldn¹t be a problem.  There will be a latency spike, but that¹s better
than starvation. You can split processing of a single partition pretty
easily within a JVM.  The kafka consuming runnable can just put messages
into a concurrent queue of some sort, and then have a large thread pool
pulling from that queue to do the processing.  That way if a thread in the
pool gets hung, there are many left to consume off the queue so nothing
gets hung up.  But this adds some risk on failover based on how kafka does
offset management for the high level consumer.

So, I don¹t think that sending backoff messages to a producer to let up on
a partition is a good design pattern for kafka. Again, the solution is
more partitions.  But offset data is stored in either kafka or zookeeper
depending on your configuration, which can tell you how many messages your
consumer is behind by.  But, since messages being published should be
evenly distributed across all partitions for a topic, all partitions
should be lagging equally.

If you need a true unified queue RabitMQ might be right for your needs.
But if order doesn¹t matter at all, kafka should give you more throughput
with enough partitions.  And since order doesn¹t matter, you have a lot of
flexibility here.

Also, another option to doing everything in a native java client is to use
a Spark application.  It makes faning out your data very easy, and has
some semantics that make it well suited for some of these concerns.

On 9/10/15, 9:54 AM, "Reza Aliakbari" <ra...@gmail.com> wrote:

>Hi Everybody,
>
>I have 2 question regarding the way consumers, consume messages of a
>partition.
>
>
>   - * Is it possible to configure Kafka to allow concurrent message
>   consumption from one partition concurrently? The order is not my
>concern at
>   all.*
>
>           I couldn't find any way to that by the Group Of Consumer
>approach, If it is possible please let me know, If impossible, then let me
>know how to address this problem:
>           For a reason a consumer that is assigned to a partition could
>get very slow, and the messages would be processed very slowly. How can I
>detect this and stop producing on this slow partition...
>
>
>
>
>   - * Suppose I have 5 partitions and 3 consumers and I am using Group of
>   Consumers model(I had 5 consumers at start but 2 servers crashed), 3
>   consumers are working busy with their 3   partitions and they never get
>   finished since the producer produce to their partitions **non-stop and
>a
>   little faster than their consumption. What happens to the other 2
>   partitions that are missing consumers? How the Group of Consumers can
>   handle this issue?*
>
>
>*The order is no matter for me, I need a simple configuration that address
>my concurrency needs and I need to make sure no message gets into
>starvation scenario that never consumed.*
>
>Please let us know, we want to select between Kafka and RabitMQ and we
>prefer Kafka because it is growing community and high throughput, But
>first
>we need to address these basic needs.
>
>
>Thanks,
>
>Reza Aliakabri