You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by allen chan <al...@gmail.com> on 2016/01/30 08:50:22 UTC

Questions from new user

Use case: We are using kafka as broker in one of our elasticsearch
clusters. Kafka caches the logs if elasticsearch has any performance
issues.  I have Kafka set to delete logs pretty quickly to keep things in
the file cache to limit IO.

Questions:
1. in 0.9 it seems like consumer offers are stored only in Kafka. Is there
a way to configure Kafka to delete my production logs pretty quickly but
have a different retention behavior for the consumer offsets?

2. Our consumer lag monitoring show us that a lot of times our consumers
are behind somewhere between 500 to 1000 messages. Looking at the JMX
metrics requestSizeAvg and requestSizeMax, it shows our average request
size is 500 bytes and max request size is 800,000 bytes. I assume the lag
is because that batch could only hold one message given the max is 1000000
bytes. I plan to enable compression and increase the max.bytes to 10mb to
fix this short term. In a few blogs, people mentioned the ultimate fix
should be splitting the message into smaller chunks in the producer and
then having the consumer put it back together. Is that handled in the kafka
producer/consumer natively or has to be handled outside of it?

Thanks for the attention.
Allen Chan

Re: Questions from new user

Posted by Alexis Midon <al...@airbnb.com.INVALID>.
0. I don't understand how deleting log files quickly relates to file/page
cache. Only consumer read patterns are the main factor here afaik.
The OS will eventually discard unused cached pages. I'm not an expert of
page cache policies though, and will be happy to learn.

1. have a look at per-topic config
https://kafka.apache.org/documentation.html#topic-config

2.
- please tell us:
  . what version of the kafka producer/consumer you're using? 0.9, or 0.8?
  . what exact metrics you're referring to?
http://docs.confluent.io/1.0/kafka/monitoring.html

- I'm assuming you're talking about the request-size-avg metric of the
__producer__ 0.9, as described in
http://kafka.apache.org/documentation.html#new_producer_monitoring

If so, the produce request will be capped by the message max size indeed.
Another limiting factor would be the `linger.ms` setting of the producer,
in case of low message rate.
Otherwise, please share the exact metrics you're using for the consumer lag.

- Splitting/Assembling messages in the application sounds quite a pain.

- the lag could also be introduced by the application processing the
messages. Have you checked that side?



On Tue, Feb 16, 2016 at 7:30 PM allen chan <al...@gmail.com>
wrote:

> Hi can anyone help with this?
>
> On Fri, Jan 29, 2016 at 11:50 PM, allen chan <allen.michael.chan@gmail.com
> >
> wrote:
>
> > Use case: We are using kafka as broker in one of our elasticsearch
> > clusters. Kafka caches the logs if elasticsearch has any performance
> > issues.  I have Kafka set to delete logs pretty quickly to keep things in
> > the file cache to limit IO.
> >
> > Questions:
> > 1. in 0.9 it seems like consumer offers are stored only in Kafka. Is
> there
> > a way to configure Kafka to delete my production logs pretty quickly but
> > have a different retention behavior for the consumer offsets?
> >
> > 2. Our consumer lag monitoring show us that a lot of times our consumers
> > are behind somewhere between 500 to 1000 messages. Looking at the JMX
> > metrics requestSizeAvg and requestSizeMax, it shows our average request
> > size is 500 bytes and max request size is 800,000 bytes. I assume the lag
> > is because that batch could only hold one message given the max is
> 1000000
> > bytes. I plan to enable compression and increase the max.bytes to 10mb to
> > fix this short term. In a few blogs, people mentioned the ultimate fix
> > should be splitting the message into smaller chunks in the producer and
> > then having the consumer put it back together. Is that handled in the
> kafka
> > producer/consumer natively or has to be handled outside of it?
> >
> > Thanks for the attention.
> > Allen Chan
> >
> >
> >
>
>
> --
> Allen Michael Chan
>

Re: Questions from new user

Posted by allen chan <al...@gmail.com>.
Hi can anyone help with this?

On Fri, Jan 29, 2016 at 11:50 PM, allen chan <al...@gmail.com>
wrote:

> Use case: We are using kafka as broker in one of our elasticsearch
> clusters. Kafka caches the logs if elasticsearch has any performance
> issues.  I have Kafka set to delete logs pretty quickly to keep things in
> the file cache to limit IO.
>
> Questions:
> 1. in 0.9 it seems like consumer offers are stored only in Kafka. Is there
> a way to configure Kafka to delete my production logs pretty quickly but
> have a different retention behavior for the consumer offsets?
>
> 2. Our consumer lag monitoring show us that a lot of times our consumers
> are behind somewhere between 500 to 1000 messages. Looking at the JMX
> metrics requestSizeAvg and requestSizeMax, it shows our average request
> size is 500 bytes and max request size is 800,000 bytes. I assume the lag
> is because that batch could only hold one message given the max is 1000000
> bytes. I plan to enable compression and increase the max.bytes to 10mb to
> fix this short term. In a few blogs, people mentioned the ultimate fix
> should be splitting the message into smaller chunks in the producer and
> then having the consumer put it back together. Is that handled in the kafka
> producer/consumer natively or has to be handled outside of it?
>
> Thanks for the attention.
> Allen Chan
>
>
>


-- 
Allen Michael Chan