You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jun Rao <ju...@confluent.io> on 2015/01/01 00:05:44 UTC

Re: metrics about how behind a replica is?

In 0.8.2, we have a new java producer that allows to you specify a callback
for each message to be sent.

Thanks,

Jun

On Thu, Dec 18, 2014 at 12:07 PM, Xiaoyu Wang <xw...@rocketfuel.com> wrote:

> @Jun, We can increase the number of resends, but the produce request may
> still fail.
>
> For async producer, at the time when it fails, we have
>
>    - messages that are in queue but has not been sent. From javaapi, we
>    don't know which messages are still in queue.
>
>
>    - Is it possible that we expose the blocking queue size so we know what
>       remains in the queue?
>
>
>    - messages we have failed retrying. For the last batch, some may have
>    succeeded, but some failed retrying. From javaapi, we don't know what
> are
>    the messages failed.
>       - Is it possible to dump the failed messages to a file so that the
>       next run can pick them up?
>
> Does this make sense? Is there other way you will recommend to keep track
> of messages that have been sent for async producer?
>
> Thanks
>
>
>
> On Wed, Dec 17, 2014 at 10:58 AM, Jun Rao <ju...@confluent.io> wrote:
> >
> > You can configure the number of resends on the producer.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Dec 17, 2014 at 10:34 AM, Xiaoyu Wang <xw...@rocketfuel.com>
> > wrote:
> > >
> > > I have tested using "async" producer with "required.ack=-1" and got
> > really
> > > good performance.
> > >
> > > We have not used async producer much previously, any potential dataloss
> > > when a broker goes down? For example, when a broker goes down, does
> > > producer resend all the messages in a batch?
> > >
> > >
> > > On Wed, Dec 17, 2014 at 1:16 PM, Xiaoyu Wang <xw...@rocketfuel.com>
> > wrote:
> > > >
> > > > Thanks Jun.
> > > >
> > > > We have tested our producer with the different required.ack config.
> > Even
> > > > with the required.ack=1, the producer is > 10 times slower than with
> > > > required.ack=0. Does this confirm with your  testing?
> > > >
> > > > I saw the presentation of LinkedIn Kafka SRE. Wondering what
> > > configuration
> > > > you guys have at LinkedIn to guarantee zero data loss.
> > > >
> > > > Thanks again and really appreciate your help!
> > > >
> > > > On Tue, Dec 16, 2014 at 9:50 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >>
> > > >> replica.lag.max.messages only controls when a replica should be
> > dropped
> > > >> out
> > > >> of the in-sync replica set (ISR). For a message to be considered
> > > >> committed,
> > > >> it has to be added to every replica in ISR. When the producer uses
> > > ack=-1,
> > > >> the broker waits until the produced message is committed before
> > > >> acknowledging the client. So in the case of a clean leader election
> > > (i.e.,
> > > >> there is at least one remaining replica in ISR), no committed
> messages
> > > are
> > > >> lost. In the case of an unclean leader election, the number of
> > messages
> > > >> that can be lost depends on the state of the replicas and it's
> > possible
> > > to
> > > >> lose more than replica.lag.max.messages messages.
> > > >>
> > > >> We do have the lag jmx metric per replica (see
> > > >> http://kafka.apache.org/documentation.html#monitoring).
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >> On Sun, Dec 14, 2014 at 7:20 AM, Xiaoyu Wang <xw...@rocketfuel.com>
> > > >> wrote:
> > > >> >
> > > >> > Hello,
> > > >> >
> > > >> > If I understand it correctly, when the number of messages a
> replica
> > is
> > > >> > behind from the leader is < replica.lag.max.messages, the replica
> is
> > > >> > considered in sync with the master and are eligible for leader
> > > election.
> > > >> >
> > > >> > This means we can lose at most replica.lag.max.messages messages
> > > during
> > > >> > leader election, is it? We can set the replica.lag.max.messages to
> > be
> > > >> very
> > > >> > low, but then we may result in unclean leader election, so still
> we
> > > can
> > > >> > lose data.
> > > >> >
> > > >> > Can you recommend some way to prevent data loss? We have tried
> > setting
> > > >> > require ack from all replicas, but that slows down producer
> > > >> significantly.
> > > >> >
> > > >> > In addition, do we have metrics about how far each replica is
> > behind?
> > > If
> > > >> > not, can we add them.
> > > >> >
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >>
> > > >
> > >
> >
>