You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Raghu Angadi <ra...@angadi.org> on 2013/01/04 09:24:47 UTC

How can a producer avoid slow brokers?

Producer distributes messages uniformly across the partitions.

This does not work very well when some of the brokers are much slower than
others. Is there a way to temporarily avoid such slow brokers?

While async producers, I could avoid producers that have lot more messages
in their internal queue compared to others (through my own Partitioners).
But the queue size is not available. tried to maintain my own estimate of
queue size using 'CallbackHandler', but API does not seem to provide enough
info (it provides partition id, but not broker id. plus, CallbackHandler
seems to be removed in 0.8).

any suggestions?

Kafka version : 0.7.1

thanks,
Raghu.

Re: How can a producer avoid slow brokers?

Posted by Raghu Angadi <an...@gmail.com>.

We can't afford to block since we don't want the pressure
to percolate upstream (even if we did, scribe would drop the messages
rather than the producer).

aim is to not block while queueing as long as there are enough brokers that
can handle the load.

with infinite timeout (or even with smaller time out like 1 second), when a
thread invokes producer.send(ProducerData) it does not know whether it is
going to block or not. I am trying to write a partitioner that can avoid
asyncProducers with higher backlog their queue.

Of course, if the brokers in aggregate can't handle that load, then there
is no choice but to drop the messages. But currently we end up dropping
messages even with one single slow broker.

Raghu.

On Fri, Jan 4, 2013 at 10:49 AM, Neha Narkhede <ne...@gmail.com>wrote:

> queue.enqueueTimeout.ms = -1 will block the producer instead of dropping
> messages. This might be useful if you have the producer being fed by scribe
> aggregators.
>
> Thanks,
> Neha
>
>
> On Fri, Jan 4, 2013 at 9:04 AM, Raghu Angadi <ra...@gmail.com>
> wrote:
>
> > On Fri, Jan 4, 2013 at 8:39 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Do you know why some of the brokers are much slower than others?
> >
> >
> > We are currently running these in a shared environment, to make things
> > worse these machines have single spindle. We have to put up with that
> until
> > we move the brokers to dedicated hardware with multiple spindles. The
> > problem is a bit exaggerated in current setup.
> >
> > Even with dedicated hardware, I am expecting some variation. One slightly
> > degraded disk out of 12 could reduce effective b/w on all the spindles.
> > Unfortunately there will be occasional rack level network slowdowns that
> > take many hours to get fixed.
> >
> > In our case, we cannot let the back pressure from slow brokers propagate
> > upstream. Producers receive messages from scribe aggregators and just
> have
> > to drop the messages it they can't write fast enough.
> >
>

Re: How can a producer avoid slow brokers?

Posted by Neha Narkhede <ne...@gmail.com>.

queue.enqueueTimeout.ms = -1 will block the producer instead of dropping
messages. This might be useful if you have the producer being fed by scribe
aggregators.

Thanks,
Neha


On Fri, Jan 4, 2013 at 9:04 AM, Raghu Angadi <ra...@gmail.com> wrote:

> On Fri, Jan 4, 2013 at 8:39 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Do you know why some of the brokers are much slower than others?
>
>
> We are currently running these in a shared environment, to make things
> worse these machines have single spindle. We have to put up with that until
> we move the brokers to dedicated hardware with multiple spindles. The
> problem is a bit exaggerated in current setup.
>
> Even with dedicated hardware, I am expecting some variation. One slightly
> degraded disk out of 12 could reduce effective b/w on all the spindles.
> Unfortunately there will be occasional rack level network slowdowns that
> take many hours to get fixed.
>
> In our case, we cannot let the back pressure from slow brokers propagate
> upstream. Producers receive messages from scribe aggregators and just have
> to drop the messages it they can't write fast enough.
>

Re: How can a producer avoid slow brokers?

Posted by Raghu Angadi <ra...@gmail.com>.

On Fri, Jan 4, 2013 at 8:39 AM, Jun Rao <ju...@gmail.com> wrote:

> Do you know why some of the brokers are much slower than others?


We are currently running these in a shared environment, to make things
worse these machines have single spindle. We have to put up with that until
we move the brokers to dedicated hardware with multiple spindles. The
problem is a bit exaggerated in current setup.

Even with dedicated hardware, I am expecting some variation. One slightly
degraded disk out of 12 could reduce effective b/w on all the spindles.
Unfortunately there will be occasional rack level network slowdowns that
take many hours to get fixed.

In our case, we cannot let the back pressure from slow brokers propagate
upstream. Producers receive messages from scribe aggregators and just have
to drop the messages it they can't write fast enough.

Re: How can a producer avoid slow brokers?

Posted by Jun Rao <ju...@gmail.com>.

Do you know why some of the brokers are much slower than others?

Thanks,

Jun

On Fri, Jan 4, 2013 at 12:24 AM, Raghu Angadi <ra...@angadi.org> wrote:

> Producer distributes messages uniformly across the partitions.
>
> This does not work very well when some of the brokers are much slower than
> others. Is there a way to temporarily avoid such slow brokers?
>
> While async producers, I could avoid producers that have lot more messages
> in their internal queue compared to others (through my own Partitioners).
> But the queue size is not available. tried to maintain my own estimate of
> queue size using 'CallbackHandler', but API does not seem to provide enough
> info (it provides partition id, but not broker id. plus, CallbackHandler
> seems to be removed in 0.8).
>
> any suggestions?
>
> Kafka version : 0.7.1
>
> thanks,
> Raghu.
>

Re: How can a producer avoid slow brokers?

Posted by Raghu Angadi <an...@gmail.com>.

Thanks. Using a single producer is an interesting option. It it would limit
the compression bandwidth to single core and is would be issue a producer
handles high volumes. Plus, we just have 10 kafka servers and even 10% is
still quite large.

It may not be that hard to do a little better that current policy that
expects negligible difference across brokers. Single producer is an example
of that.

FWIW, avoiding async producers with excessing backlog in the queue
drastically improved success rate in our case. It needs a patch for Kafka.
Any user fix would be a bit hacky since interfaces like Partitioner don't
have enough info (e.g. with just total number of partitions, and we just
have to guess how brokers map to partitions)

Raghu.

On Fri, Jan 4, 2013 at 3:03 PM, Jay Kreps <ja...@gmail.com> wrote:

> I think the problem you are describing is that if a single broker is slow
> all producers will come to a halt (because they all talk to this broker).
>
> We don't have a great solution for this at the moment.
>
> In our own usage for the first tier of data collection each producer
> connects to a single broker and sends all data there and if it dies the
> producer reconnects. This somewhat moderates the problem since if only 1 of
> n brokers is slow, only 1/nth the producers are impacted. This does not
> allow any semantic partitioning by key. You should be able to accomplish
> this with a custom partitioner that chooses a random partition and sticks
> with it instead of round-robining.
>
> A more sophisticated solution might detect slow brokers and shoot them in
> the head. If the detection works correctly and the underlying cause is some
> hardware problem or other process on the machine, then just killing the
> node would fix the problem. However if the problem is just load then this
> will probably make things worse. It is also a bit tricky to define what is
> "slow" and have the user accurately configure that. It would be easy to
> imagine a half-assed implementation causing more problems then it fixed.
>
> -Jay
>
>
> On Fri, Jan 4, 2013 at 12:24 AM, Raghu Angadi <ra...@angadi.org> wrote:
>
> > Producer distributes messages uniformly across the partitions.
> >
> > This does not work very well when some of the brokers are much slower
> than
> > others. Is there a way to temporarily avoid such slow brokers?
> >
> > While async producers, I could avoid producers that have lot more
> messages
> > in their internal queue compared to others (through my own Partitioners).
> > But the queue size is not available. tried to maintain my own estimate of
> > queue size using 'CallbackHandler', but API does not seem to provide
> enough
> > info (it provides partition id, but not broker id. plus, CallbackHandler
> > seems to be removed in 0.8).
> >
> > any suggestions?
> >
> > Kafka version : 0.7.1
> >
> > thanks,
> > Raghu.
> >
>

Re: How can a producer avoid slow brokers?

Posted by Jay Kreps <ja...@gmail.com>.

I think the problem you are describing is that if a single broker is slow
all producers will come to a halt (because they all talk to this broker).

We don't have a great solution for this at the moment.

In our own usage for the first tier of data collection each producer
connects to a single broker and sends all data there and if it dies the
producer reconnects. This somewhat moderates the problem since if only 1 of
n brokers is slow, only 1/nth the producers are impacted. This does not
allow any semantic partitioning by key. You should be able to accomplish
this with a custom partitioner that chooses a random partition and sticks
with it instead of round-robining.

A more sophisticated solution might detect slow brokers and shoot them in
the head. If the detection works correctly and the underlying cause is some
hardware problem or other process on the machine, then just killing the
node would fix the problem. However if the problem is just load then this
will probably make things worse. It is also a bit tricky to define what is
"slow" and have the user accurately configure that. It would be easy to
imagine a half-assed implementation causing more problems then it fixed.

-Jay

On Fri, Jan 4, 2013 at 12:24 AM, Raghu Angadi <ra...@angadi.org> wrote:

> Producer distributes messages uniformly across the partitions.
>
> This does not work very well when some of the brokers are much slower than
> others. Is there a way to temporarily avoid such slow brokers?
>
> While async producers, I could avoid producers that have lot more messages
> in their internal queue compared to others (through my own Partitioners).
> But the queue size is not available. tried to maintain my own estimate of
> queue size using 'CallbackHandler', but API does not seem to provide enough
> info (it provides partition id, but not broker id. plus, CallbackHandler
> seems to be removed in 0.8).
>
> any suggestions?
>
> Kafka version : 0.7.1
>
> thanks,
> Raghu.
>