You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Eric Sites <Er...@threattrack.com> on 2013/08/14 03:47:10 UTC

Very low volume topic

Hello everyone,

I have a very low volume topic that has 2 consumers in the same group. How do I get each consumer to only consume 1 message at a time and if the the first consumer is busy get the other consumer to consume the message?

Currently what I am doing is:

First consumer connects to Kafka waits for 300 milliseconds then disconnects, waits for 10 seconds, then reconnects to see if there is a waiting message.

The messages kick off a long task on each server, each server can handle multiple tasks up to a limit so first I am trying to balance the tasks across multiple servers and if they are maxed out don't consume any messages.

This will give the other server or servers a chances to pickup a message and do the task.

I would not disconnect if I can ensure I don't have messages waiting in the queue for a server to consume them without the other servers being able to see them.

Thanks for the help...

Cheers,
Eric Sites



Re: Very low volume topic

Posted by Jun Rao <ju...@gmail.com>.
Not sure if there is an easy way to do what you want. One thing I can think
of is to reduce queuedchunks.max and fetch.size to reduce the # of queued
messages.

Thanks,

Jun


On Tue, Aug 13, 2013 at 6:47 PM, Eric Sites <Er...@threattrack.com>wrote:

> Hello everyone,
>
> I have a very low volume topic that has 2 consumers in the same group. How
> do I get each consumer to only consume 1 message at a time and if the the
> first consumer is busy get the other consumer to consume the message?
>
> Currently what I am doing is:
>
> First consumer connects to Kafka waits for 300 milliseconds then
> disconnects, waits for 10 seconds, then reconnects to see if there is a
> waiting message.
>
> The messages kick off a long task on each server, each server can handle
> multiple tasks up to a limit so first I am trying to balance the tasks
> across multiple servers and if they are maxed out don't consume any
> messages.
>
> This will give the other server or servers a chances to pickup a message
> and do the task.
>
> I would not disconnect if I can ensure I don't have messages waiting in
> the queue for a server to consume them without the other servers being able
> to see them.
>
> Thanks for the help...
>
> Cheers,
> Eric Sites
>
>
>

Re: Very low volume topic

Posted by Philip O'Toole <ph...@loggly.com>.
OK, I think I follow you.

Well, if the message volume is very low, then I don't think you need
the performance of Kafka. Perhaps a different design, where your
workers pull from a shared queue in memory somewhere might be better
(perhaps *that* queue could be filled by Kafka consumer reading from
an actual Kafka topic). Yes, the queue may need synchronization to
ensure each job only gets pulled off the queue once, but you said it's
low volume so performance shouldn't be a concern.

Philip

On Tue, Aug 13, 2013 at 7:13 PM, Eric Sites <Er...@threattrack.com> wrote:
> Responses inline
>
> On 8/13/13 9:57 PM, "Philip O'Toole" <ph...@loggly.com> wrote:
>
>>My experience is solely with 0.72. More inline.
>
> I am currently using 0.8.
>
>>
>>On Tue, Aug 13, 2013 at 6:47 PM, Eric Sites <Er...@threattrack.com>
>>wrote:
>>> Hello everyone,
>>>
>>> I have a very low volume topic that has 2 consumers in the same group.
>>>How do I get each consumer to only consume 1 message at a time and if
>>>the the first consumer is busy get the other consumer to consume the
>>>message?
>>
>>You can't, not if you only have one partition. Each consumer is
>>dedicated to a single partition. Unless you deliberately tear down the
>>consumer and let another take over that partition (if you are using
>>the high-level consumer).
>
> I am using multiple partitions, currently 4 partitions.
>
>>
>>>
>>> Currently what I am doing is:
>>>
>>> First consumer connects to Kafka waits for 300 milliseconds then
>>>disconnects, waits for 10 seconds, then reconnects to see if there is a
>>>waiting message.
>>
>>I don't think you need to do this. The high-level has a API that
>>allows you to set this timeout (I think).
>
> I am using that timeout on the high-level consumer, that is the 300
> millisecond wait period. Then I do a consumer.shutdown(), wait 10 seconds
> and reconnect.
>
>>
>>>
>>> The messages kick off a long task on each server, each server can
>>>handle multiple tasks up to a limit so first I am trying to balance the
>>>tasks across multiple servers and if they are maxed out don't consume
>>>any messages.
>>>
>>> This will give the other server or servers a chances to pickup a
>>>message and do the task.
>>>
>>> I would not disconnect if I can ensure I don't have messages waiting in
>>>the queue for a server to consume them without the other servers being
>>>able to see them.
>>
>>I think a better design would be to have a basic consumer that drains
>>the topic and hands jobs to the set of available workers. *Those*
>>workers perform the long-running job. Only if there are no available
>>workers does the consumer block. You may be trying to do too much in
>>the consumer.
>
> The available workers are entire servers, that can produce lots of network
> IO and generate 100k+ Kafka messages to other Kafka topics that get
> consumed by Hadoop and other systems.
>
> I used Kafka for these start job messages because I already was using
> Kafka for other messages, and I will most likely add more servers to
> consume this start job messages.
>
> I don¹t know how long the job will take until I consume the start job
> message. Sometimes it may only take seconds or could take hours.
>
> I have a managed thread pool that only allows x number of tasks types to
> run at one time from each job, so that one job does not overwhelm a single
> server. This allows a server to handle multiple things while waiting on
> the network IO.
>
> My only issue is balancing the job start messages across multiple servers
> depending on the servers load/available threads in the thread pool.
>
> The only real issue I am currently having is that I think this frequent
> connect/disconnect is causing issue on the Kafka servers with rebalancing
> the 4 topics back and forth between the worker servers.
>
>>
>>>
>>> Thanks for the help...
>>>
>>> Cheers,
>>> Eric Sites
>>>
>>>
>
> - Eric Sites
>

Re: Very low volume topic

Posted by Eric Sites <Er...@threattrack.com>.
Responses inline

On 8/13/13 9:57 PM, "Philip O'Toole" <ph...@loggly.com> wrote:

>My experience is solely with 0.72. More inline.

I am currently using 0.8.

>
>On Tue, Aug 13, 2013 at 6:47 PM, Eric Sites <Er...@threattrack.com>
>wrote:
>> Hello everyone,
>>
>> I have a very low volume topic that has 2 consumers in the same group.
>>How do I get each consumer to only consume 1 message at a time and if
>>the the first consumer is busy get the other consumer to consume the
>>message?
>
>You can't, not if you only have one partition. Each consumer is
>dedicated to a single partition. Unless you deliberately tear down the
>consumer and let another take over that partition (if you are using
>the high-level consumer).

I am using multiple partitions, currently 4 partitions.

>
>>
>> Currently what I am doing is:
>>
>> First consumer connects to Kafka waits for 300 milliseconds then
>>disconnects, waits for 10 seconds, then reconnects to see if there is a
>>waiting message.
>
>I don't think you need to do this. The high-level has a API that
>allows you to set this timeout (I think).

I am using that timeout on the high-level consumer, that is the 300
millisecond wait period. Then I do a consumer.shutdown(), wait 10 seconds
and reconnect.

>
>>
>> The messages kick off a long task on each server, each server can
>>handle multiple tasks up to a limit so first I am trying to balance the
>>tasks across multiple servers and if they are maxed out don't consume
>>any messages.
>>
>> This will give the other server or servers a chances to pickup a
>>message and do the task.
>>
>> I would not disconnect if I can ensure I don't have messages waiting in
>>the queue for a server to consume them without the other servers being
>>able to see them.
>
>I think a better design would be to have a basic consumer that drains
>the topic and hands jobs to the set of available workers. *Those*
>workers perform the long-running job. Only if there are no available
>workers does the consumer block. You may be trying to do too much in
>the consumer.

The available workers are entire servers, that can produce lots of network
IO and generate 100k+ Kafka messages to other Kafka topics that get
consumed by Hadoop and other systems.

I used Kafka for these start job messages because I already was using
Kafka for other messages, and I will most likely add more servers to
consume this start job messages.

I don¹t know how long the job will take until I consume the start job
message. Sometimes it may only take seconds or could take hours.

I have a managed thread pool that only allows x number of tasks types to
run at one time from each job, so that one job does not overwhelm a single
server. This allows a server to handle multiple things while waiting on
the network IO. 

My only issue is balancing the job start messages across multiple servers
depending on the servers load/available threads in the thread pool.

The only real issue I am currently having is that I think this frequent
connect/disconnect is causing issue on the Kafka servers with rebalancing
the 4 topics back and forth between the worker servers.

>
>>
>> Thanks for the help...
>>
>> Cheers,
>> Eric Sites
>>
>>

- Eric Sites


Re: Very low volume topic

Posted by Philip O'Toole <ph...@loggly.com>.
My experience is solely with 0.72. More inline.

On Tue, Aug 13, 2013 at 6:47 PM, Eric Sites <Er...@threattrack.com> wrote:
> Hello everyone,
>
> I have a very low volume topic that has 2 consumers in the same group. How do I get each consumer to only consume 1 message at a time and if the the first consumer is busy get the other consumer to consume the message?

You can't, not if you only have one partition. Each consumer is
dedicated to a single partition. Unless you deliberately tear down the
consumer and let another take over that partition (if you are using
the high-level consumer).

>
> Currently what I am doing is:
>
> First consumer connects to Kafka waits for 300 milliseconds then disconnects, waits for 10 seconds, then reconnects to see if there is a waiting message.

I don't think you need to do this. The high-level has a API that
allows you to set this timeout (I think).

>
> The messages kick off a long task on each server, each server can handle multiple tasks up to a limit so first I am trying to balance the tasks across multiple servers and if they are maxed out don't consume any messages.
>
> This will give the other server or servers a chances to pickup a message and do the task.
>
> I would not disconnect if I can ensure I don't have messages waiting in the queue for a server to consume them without the other servers being able to see them.

I think a better design would be to have a basic consumer that drains
the topic and hands jobs to the set of available workers. *Those*
workers perform the long-running job. Only if there are no available
workers does the consumer block. You may be trying to do too much in
the consumer.

>
> Thanks for the help...
>
> Cheers,
> Eric Sites
>
>