You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by tao xiao <xi...@gmail.com> on 2015/03/02 12:15:53 UTC

Got negative offset lag after restarting brokers

Hi team,

I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did some
tests on the two brokers to verify how leader got elected. Here are the
steps:

1. started 2 brokers
2. created a topic with partition=1 and replication-factor=2. Now brokers 1
was elected as leader
3. sent 1000 messages to the topic and consumed from a high level consumer
using zk as the offset storage.
4. shutdown broker 1 and now broker 0 was elected as leader
5. sent another 1000 messages to topic and consumed again
6. completely shutdown broker 0 and then started broker 1. now broker 1
became the leader
7. started broker 0 and ran ConsumerOffsetChecker which showed negative lag
(-1000 in my case)

I think this is because the consumed offset in zk was 2000 and logsize
retrieved from the leader (broker 1) which missed 1000 messages in step 5
in this case was 1000 there -1000 = 1000 - 2000 was given.

Is this a bug or expected behavior?

-- 
Regards,
Tao

Re: Got negative offset lag after restarting brokers

Posted by tao xiao <xi...@gmail.com>.

Thanks guy. with unclean.leader.election.enable set to false the issue is
fixed

On Tue, Mar 3, 2015 at 2:50 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> of course :)
> unclean.leader.election.enable
>
> On Mon, Mar 2, 2015 at 9:10 PM, tao xiao <xi...@gmail.com> wrote:
> > How do I achieve point 3? is there a config that I can set?
> >
> > On Tue, Mar 3, 2015 at 1:02 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> > wrote:
> >
> >> The scenario you mentioned is equivalent to an unclean leader election.
> >> The following settings will make sure there is no data loss:
> >> 1. Set replica factor to 3 and minimum ISR size to 2.
> >> 2. When produce, use acks=-1 or acks=all
> >> 3. Disable unclean leader election.
> >>
> >> 1) and 2) Guarantees committed messages will be at least in to brokers.
> >> 3) Means if a broker is not in ISR, it cannot be elected as a leader, so
> >> the log truncate as mentioned earlier will not happen.
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On 3/2/15, 7:16 PM, "tao xiao" <xi...@gmail.com> wrote:
> >>
> >> >Since I reused the same consumer group to consume the messages after
> step
> >> >6
> >> >data there was no data loss occurred. But if I create a new consumer
> group
> >> >for sure the new consumer will suffer data loss.
> >> >
> >> >I am more concerning about if this is an acceptable behavior by Kafka
> that
> >> >an out of sync broker can be elected as the leader for a partition. Is
> >> >there any mechanism built around Kafka to ensure that only the in-sync
> >> >broker can be chosen to be a leader? If no, what is the best practice
> to
> >> >restart brokers if some of the replicas are out of sync?
> >> >
> >> >On Tue, Mar 3, 2015 at 2:35 AM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> >> >wrote:
> >> >
> >> >> In this case you have data loss. In step 6, when broker 1 comes up,
> it
> >> >> becomes the leader and has log end offset 1000. When broker 0 comes
> up,
> >> >>it
> >> >> becomes follower and will truncate its log to 1000, i.e. 1000
> messages
> >> >> were lost. Next time when the consumer starts, its offset will be
> reset
> >> >>to
> >> >> either the smallest or the largest depending on the setting.
> >> >>
> >> >> Jiangjie (Becket) Qin
> >> >>
> >> >> On 3/2/15, 9:32 AM, "Stuart Reynolds" <st...@stureynolds.com> wrote:
> >> >>
> >> >> >Each topic has:  earliest and latest offsets (per partition)
> >> >> >Each consumer group has a current offset (per topic, partition pair)
> >> >> >
> >> >> >I see -1 for the current offsets new consumer groups that haven't
> yet
> >> >> >committed an offset. I think it means that the offsets for that
> >> >> >consumer group are undefined.
> >> >> >
> >> >> >Is it possible you generated new consumer groups when you restarted
> >> >>your
> >> >> >broker?
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com>
> wrote:
> >> >> >> Hi team,
> >> >> >>
> >> >> >> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I
> did
> >> >>some
> >> >> >> tests on the two brokers to verify how leader got elected. Here
> are
> >> >>the
> >> >> >> steps:
> >> >> >>
> >> >> >> 1. started 2 brokers
> >> >> >> 2. created a topic with partition=1 and replication-factor=2. Now
> >> >> >>brokers 1
> >> >> >> was elected as leader
> >> >> >> 3. sent 1000 messages to the topic and consumed from a high level
> >> >> >>consumer
> >> >> >> using zk as the offset storage.
> >> >> >> 4. shutdown broker 1 and now broker 0 was elected as leader
> >> >> >> 5. sent another 1000 messages to topic and consumed again
> >> >> >> 6. completely shutdown broker 0 and then started broker 1. now
> >> >>broker 1
> >> >> >> became the leader
> >> >> >> 7. started broker 0 and ran ConsumerOffsetChecker which showed
> >> >>negative
> >> >> >>lag
> >> >> >> (-1000 in my case)
> >> >> >>
> >> >> >> I think this is because the consumed offset in zk was 2000 and
> >> >>logsize
> >> >> >> retrieved from the leader (broker 1) which missed 1000 messages in
> >> >>step
> >> >> >>5
> >> >> >> in this case was 1000 there -1000 = 1000 - 2000 was given.
> >> >> >>
> >> >> >> Is this a bug or expected behavior?
> >> >> >>
> >> >> >> --
> >> >> >> Regards,
> >> >> >> Tao
> >> >>
> >> >>
> >> >
> >> >
> >> >--
> >> >Regards,
> >> >Tao
> >>
> >>
> >
> >
> > --
> > Regards,
> > Tao
>



-- 
Regards,
Tao

Re: Got negative offset lag after restarting brokers

Posted by Gwen Shapira <gs...@cloudera.com>.

of course :)
unclean.leader.election.enable

On Mon, Mar 2, 2015 at 9:10 PM, tao xiao <xi...@gmail.com> wrote:
> How do I achieve point 3? is there a config that I can set?
>
> On Tue, Mar 3, 2015 at 1:02 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
>> The scenario you mentioned is equivalent to an unclean leader election.
>> The following settings will make sure there is no data loss:
>> 1. Set replica factor to 3 and minimum ISR size to 2.
>> 2. When produce, use acks=-1 or acks=all
>> 3. Disable unclean leader election.
>>
>> 1) and 2) Guarantees committed messages will be at least in to brokers.
>> 3) Means if a broker is not in ISR, it cannot be elected as a leader, so
>> the log truncate as mentioned earlier will not happen.
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/2/15, 7:16 PM, "tao xiao" <xi...@gmail.com> wrote:
>>
>> >Since I reused the same consumer group to consume the messages after step
>> >6
>> >data there was no data loss occurred. But if I create a new consumer group
>> >for sure the new consumer will suffer data loss.
>> >
>> >I am more concerning about if this is an acceptable behavior by Kafka that
>> >an out of sync broker can be elected as the leader for a partition. Is
>> >there any mechanism built around Kafka to ensure that only the in-sync
>> >broker can be chosen to be a leader? If no, what is the best practice to
>> >restart brokers if some of the replicas are out of sync?
>> >
>> >On Tue, Mar 3, 2015 at 2:35 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>> >wrote:
>> >
>> >> In this case you have data loss. In step 6, when broker 1 comes up, it
>> >> becomes the leader and has log end offset 1000. When broker 0 comes up,
>> >>it
>> >> becomes follower and will truncate its log to 1000, i.e. 1000 messages
>> >> were lost. Next time when the consumer starts, its offset will be reset
>> >>to
>> >> either the smallest or the largest depending on the setting.
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >> On 3/2/15, 9:32 AM, "Stuart Reynolds" <st...@stureynolds.com> wrote:
>> >>
>> >> >Each topic has:  earliest and latest offsets (per partition)
>> >> >Each consumer group has a current offset (per topic, partition pair)
>> >> >
>> >> >I see -1 for the current offsets new consumer groups that haven't yet
>> >> >committed an offset. I think it means that the offsets for that
>> >> >consumer group are undefined.
>> >> >
>> >> >Is it possible you generated new consumer groups when you restarted
>> >>your
>> >> >broker?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com> wrote:
>> >> >> Hi team,
>> >> >>
>> >> >> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did
>> >>some
>> >> >> tests on the two brokers to verify how leader got elected. Here are
>> >>the
>> >> >> steps:
>> >> >>
>> >> >> 1. started 2 brokers
>> >> >> 2. created a topic with partition=1 and replication-factor=2. Now
>> >> >>brokers 1
>> >> >> was elected as leader
>> >> >> 3. sent 1000 messages to the topic and consumed from a high level
>> >> >>consumer
>> >> >> using zk as the offset storage.
>> >> >> 4. shutdown broker 1 and now broker 0 was elected as leader
>> >> >> 5. sent another 1000 messages to topic and consumed again
>> >> >> 6. completely shutdown broker 0 and then started broker 1. now
>> >>broker 1
>> >> >> became the leader
>> >> >> 7. started broker 0 and ran ConsumerOffsetChecker which showed
>> >>negative
>> >> >>lag
>> >> >> (-1000 in my case)
>> >> >>
>> >> >> I think this is because the consumed offset in zk was 2000 and
>> >>logsize
>> >> >> retrieved from the leader (broker 1) which missed 1000 messages in
>> >>step
>> >> >>5
>> >> >> in this case was 1000 there -1000 = 1000 - 2000 was given.
>> >> >>
>> >> >> Is this a bug or expected behavior?
>> >> >>
>> >> >> --
>> >> >> Regards,
>> >> >> Tao
>> >>
>> >>
>> >
>> >
>> >--
>> >Regards,
>> >Tao
>>
>>
>
>
> --
> Regards,
> Tao

Re: Got negative offset lag after restarting brokers

Posted by tao xiao <xi...@gmail.com>.

How do I achieve point 3? is there a config that I can set?

On Tue, Mar 3, 2015 at 1:02 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> The scenario you mentioned is equivalent to an unclean leader election.
> The following settings will make sure there is no data loss:
> 1. Set replica factor to 3 and minimum ISR size to 2.
> 2. When produce, use acks=-1 or acks=all
> 3. Disable unclean leader election.
>
> 1) and 2) Guarantees committed messages will be at least in to brokers.
> 3) Means if a broker is not in ISR, it cannot be elected as a leader, so
> the log truncate as mentioned earlier will not happen.
>
> Jiangjie (Becket) Qin
>
> On 3/2/15, 7:16 PM, "tao xiao" <xi...@gmail.com> wrote:
>
> >Since I reused the same consumer group to consume the messages after step
> >6
> >data there was no data loss occurred. But if I create a new consumer group
> >for sure the new consumer will suffer data loss.
> >
> >I am more concerning about if this is an acceptable behavior by Kafka that
> >an out of sync broker can be elected as the leader for a partition. Is
> >there any mechanism built around Kafka to ensure that only the in-sync
> >broker can be chosen to be a leader? If no, what is the best practice to
> >restart brokers if some of the replicas are out of sync?
> >
> >On Tue, Mar 3, 2015 at 2:35 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
> >wrote:
> >
> >> In this case you have data loss. In step 6, when broker 1 comes up, it
> >> becomes the leader and has log end offset 1000. When broker 0 comes up,
> >>it
> >> becomes follower and will truncate its log to 1000, i.e. 1000 messages
> >> were lost. Next time when the consumer starts, its offset will be reset
> >>to
> >> either the smallest or the largest depending on the setting.
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On 3/2/15, 9:32 AM, "Stuart Reynolds" <st...@stureynolds.com> wrote:
> >>
> >> >Each topic has:  earliest and latest offsets (per partition)
> >> >Each consumer group has a current offset (per topic, partition pair)
> >> >
> >> >I see -1 for the current offsets new consumer groups that haven't yet
> >> >committed an offset. I think it means that the offsets for that
> >> >consumer group are undefined.
> >> >
> >> >Is it possible you generated new consumer groups when you restarted
> >>your
> >> >broker?
> >> >
> >> >
> >> >
> >> >
> >> >On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com> wrote:
> >> >> Hi team,
> >> >>
> >> >> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did
> >>some
> >> >> tests on the two brokers to verify how leader got elected. Here are
> >>the
> >> >> steps:
> >> >>
> >> >> 1. started 2 brokers
> >> >> 2. created a topic with partition=1 and replication-factor=2. Now
> >> >>brokers 1
> >> >> was elected as leader
> >> >> 3. sent 1000 messages to the topic and consumed from a high level
> >> >>consumer
> >> >> using zk as the offset storage.
> >> >> 4. shutdown broker 1 and now broker 0 was elected as leader
> >> >> 5. sent another 1000 messages to topic and consumed again
> >> >> 6. completely shutdown broker 0 and then started broker 1. now
> >>broker 1
> >> >> became the leader
> >> >> 7. started broker 0 and ran ConsumerOffsetChecker which showed
> >>negative
> >> >>lag
> >> >> (-1000 in my case)
> >> >>
> >> >> I think this is because the consumed offset in zk was 2000 and
> >>logsize
> >> >> retrieved from the leader (broker 1) which missed 1000 messages in
> >>step
> >> >>5
> >> >> in this case was 1000 there -1000 = 1000 - 2000 was given.
> >> >>
> >> >> Is this a bug or expected behavior?
> >> >>
> >> >> --
> >> >> Regards,
> >> >> Tao
> >>
> >>
> >
> >
> >--
> >Regards,
> >Tao
>
>


-- 
Regards,
Tao

Re: Got negative offset lag after restarting brokers

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

The scenario you mentioned is equivalent to an unclean leader election.
The following settings will make sure there is no data loss:
1. Set replica factor to 3 and minimum ISR size to 2.
2. When produce, use acks=-1 or acks=all
3. Disable unclean leader election.

1) and 2) Guarantees committed messages will be at least in to brokers.
3) Means if a broker is not in ISR, it cannot be elected as a leader, so
the log truncate as mentioned earlier will not happen.

Jiangjie (Becket) Qin

On 3/2/15, 7:16 PM, "tao xiao" <xi...@gmail.com> wrote:

>Since I reused the same consumer group to consume the messages after step
>6
>data there was no data loss occurred. But if I create a new consumer group
>for sure the new consumer will suffer data loss.
>
>I am more concerning about if this is an acceptable behavior by Kafka that
>an out of sync broker can be elected as the leader for a partition. Is
>there any mechanism built around Kafka to ensure that only the in-sync
>broker can be chosen to be a leader? If no, what is the best practice to
>restart brokers if some of the replicas are out of sync?
>
>On Tue, Mar 3, 2015 at 2:35 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> In this case you have data loss. In step 6, when broker 1 comes up, it
>> becomes the leader and has log end offset 1000. When broker 0 comes up,
>>it
>> becomes follower and will truncate its log to 1000, i.e. 1000 messages
>> were lost. Next time when the consumer starts, its offset will be reset
>>to
>> either the smallest or the largest depending on the setting.
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/2/15, 9:32 AM, "Stuart Reynolds" <st...@stureynolds.com> wrote:
>>
>> >Each topic has:  earliest and latest offsets (per partition)
>> >Each consumer group has a current offset (per topic, partition pair)
>> >
>> >I see -1 for the current offsets new consumer groups that haven't yet
>> >committed an offset. I think it means that the offsets for that
>> >consumer group are undefined.
>> >
>> >Is it possible you generated new consumer groups when you restarted
>>your
>> >broker?
>> >
>> >
>> >
>> >
>> >On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com> wrote:
>> >> Hi team,
>> >>
>> >> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did
>>some
>> >> tests on the two brokers to verify how leader got elected. Here are
>>the
>> >> steps:
>> >>
>> >> 1. started 2 brokers
>> >> 2. created a topic with partition=1 and replication-factor=2. Now
>> >>brokers 1
>> >> was elected as leader
>> >> 3. sent 1000 messages to the topic and consumed from a high level
>> >>consumer
>> >> using zk as the offset storage.
>> >> 4. shutdown broker 1 and now broker 0 was elected as leader
>> >> 5. sent another 1000 messages to topic and consumed again
>> >> 6. completely shutdown broker 0 and then started broker 1. now
>>broker 1
>> >> became the leader
>> >> 7. started broker 0 and ran ConsumerOffsetChecker which showed
>>negative
>> >>lag
>> >> (-1000 in my case)
>> >>
>> >> I think this is because the consumed offset in zk was 2000 and
>>logsize
>> >> retrieved from the leader (broker 1) which missed 1000 messages in
>>step
>> >>5
>> >> in this case was 1000 there -1000 = 1000 - 2000 was given.
>> >>
>> >> Is this a bug or expected behavior?
>> >>
>> >> --
>> >> Regards,
>> >> Tao
>>
>>
>
>
>-- 
>Regards,
>Tao

Re: Got negative offset lag after restarting brokers

Posted by tao xiao <xi...@gmail.com>.

Since I reused the same consumer group to consume the messages after step 6
data there was no data loss occurred. But if I create a new consumer group
for sure the new consumer will suffer data loss.

I am more concerning about if this is an acceptable behavior by Kafka that
an out of sync broker can be elected as the leader for a partition. Is
there any mechanism built around Kafka to ensure that only the in-sync
broker can be chosen to be a leader? If no, what is the best practice to
restart brokers if some of the replicas are out of sync?

On Tue, Mar 3, 2015 at 2:35 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> In this case you have data loss. In step 6, when broker 1 comes up, it
> becomes the leader and has log end offset 1000. When broker 0 comes up, it
> becomes follower and will truncate its log to 1000, i.e. 1000 messages
> were lost. Next time when the consumer starts, its offset will be reset to
> either the smallest or the largest depending on the setting.
>
> Jiangjie (Becket) Qin
>
> On 3/2/15, 9:32 AM, "Stuart Reynolds" <st...@stureynolds.com> wrote:
>
> >Each topic has:  earliest and latest offsets (per partition)
> >Each consumer group has a current offset (per topic, partition pair)
> >
> >I see -1 for the current offsets new consumer groups that haven't yet
> >committed an offset. I think it means that the offsets for that
> >consumer group are undefined.
> >
> >Is it possible you generated new consumer groups when you restarted your
> >broker?
> >
> >
> >
> >
> >On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com> wrote:
> >> Hi team,
> >>
> >> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did some
> >> tests on the two brokers to verify how leader got elected. Here are the
> >> steps:
> >>
> >> 1. started 2 brokers
> >> 2. created a topic with partition=1 and replication-factor=2. Now
> >>brokers 1
> >> was elected as leader
> >> 3. sent 1000 messages to the topic and consumed from a high level
> >>consumer
> >> using zk as the offset storage.
> >> 4. shutdown broker 1 and now broker 0 was elected as leader
> >> 5. sent another 1000 messages to topic and consumed again
> >> 6. completely shutdown broker 0 and then started broker 1. now broker 1
> >> became the leader
> >> 7. started broker 0 and ran ConsumerOffsetChecker which showed negative
> >>lag
> >> (-1000 in my case)
> >>
> >> I think this is because the consumed offset in zk was 2000 and logsize
> >> retrieved from the leader (broker 1) which missed 1000 messages in step
> >>5
> >> in this case was 1000 there -1000 = 1000 - 2000 was given.
> >>
> >> Is this a bug or expected behavior?
> >>
> >> --
> >> Regards,
> >> Tao
>
>


-- 
Regards,
Tao

Re: Got negative offset lag after restarting brokers

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

In this case you have data loss. In step 6, when broker 1 comes up, it
becomes the leader and has log end offset 1000. When broker 0 comes up, it
becomes follower and will truncate its log to 1000, i.e. 1000 messages
were lost. Next time when the consumer starts, its offset will be reset to
either the smallest or the largest depending on the setting.

Jiangjie (Becket) Qin

On 3/2/15, 9:32 AM, "Stuart Reynolds" <st...@stureynolds.com> wrote:

>Each topic has:  earliest and latest offsets (per partition)
>Each consumer group has a current offset (per topic, partition pair)
>
>I see -1 for the current offsets new consumer groups that haven't yet
>committed an offset. I think it means that the offsets for that
>consumer group are undefined.
>
>Is it possible you generated new consumer groups when you restarted your
>broker?
>
>
>
>
>On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com> wrote:
>> Hi team,
>>
>> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did some
>> tests on the two brokers to verify how leader got elected. Here are the
>> steps:
>>
>> 1. started 2 brokers
>> 2. created a topic with partition=1 and replication-factor=2. Now
>>brokers 1
>> was elected as leader
>> 3. sent 1000 messages to the topic and consumed from a high level
>>consumer
>> using zk as the offset storage.
>> 4. shutdown broker 1 and now broker 0 was elected as leader
>> 5. sent another 1000 messages to topic and consumed again
>> 6. completely shutdown broker 0 and then started broker 1. now broker 1
>> became the leader
>> 7. started broker 0 and ran ConsumerOffsetChecker which showed negative
>>lag
>> (-1000 in my case)
>>
>> I think this is because the consumed offset in zk was 2000 and logsize
>> retrieved from the leader (broker 1) which missed 1000 messages in step
>>5
>> in this case was 1000 there -1000 = 1000 - 2000 was given.
>>
>> Is this a bug or expected behavior?
>>
>> --
>> Regards,
>> Tao

Re: Got negative offset lag after restarting brokers

Posted by Stuart Reynolds <st...@stureynolds.com>.

Each topic has:  earliest and latest offsets (per partition)
Each consumer group has a current offset (per topic, partition pair)

I see -1 for the current offsets new consumer groups that haven't yet
committed an offset. I think it means that the offsets for that
consumer group are undefined.

Is it possible you generated new consumer groups when you restarted your broker?




On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com> wrote:
> Hi team,
>
> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did some
> tests on the two brokers to verify how leader got elected. Here are the
> steps:
>
> 1. started 2 brokers
> 2. created a topic with partition=1 and replication-factor=2. Now brokers 1
> was elected as leader
> 3. sent 1000 messages to the topic and consumed from a high level consumer
> using zk as the offset storage.
> 4. shutdown broker 1 and now broker 0 was elected as leader
> 5. sent another 1000 messages to topic and consumed again
> 6. completely shutdown broker 0 and then started broker 1. now broker 1
> became the leader
> 7. started broker 0 and ran ConsumerOffsetChecker which showed negative lag
> (-1000 in my case)
>
> I think this is because the consumed offset in zk was 2000 and logsize
> retrieved from the leader (broker 1) which missed 1000 messages in step 5
> in this case was 1000 there -1000 = 1000 - 2000 was given.
>
> Is this a bug or expected behavior?
>
> --
> Regards,
> Tao

Re: Got negative offset lag after restarting brokers

Posted by Mayuresh Gharat <gh...@gmail.com>.

This is interesting test.
I suppose this is because while broker 0 was the leader broker 1 was
completely down and broker 1's log end offset never increased. So when it
came back up and since broker 0 was down you got a lag of -1000.


Thanks,

Mayuresh

On Mon, Mar 2, 2015 at 3:15 AM, tao xiao <xi...@gmail.com> wrote:

> Hi team,
>
> I have 2 brokers (0 and 1) serving a topic mm-benchmark-test. I did some
> tests on the two brokers to verify how leader got elected. Here are the
> steps:
>
> 1. started 2 brokers
> 2. created a topic with partition=1 and replication-factor=2. Now brokers 1
> was elected as leader
> 3. sent 1000 messages to the topic and consumed from a high level consumer
> using zk as the offset storage.
> 4. shutdown broker 1 and now broker 0 was elected as leader
> 5. sent another 1000 messages to topic and consumed again
> 6. completely shutdown broker 0 and then started broker 1. now broker 1
> became the leader
> 7. started broker 0 and ran ConsumerOffsetChecker which showed negative lag
> (-1000 in my case)
>
> I think this is because the consumed offset in zk was 2000 and logsize
> retrieved from the leader (broker 1) which missed 1000 messages in step 5
> in this case was 1000 there -1000 = 1000 - 2000 was given.
>
> Is this a bug or expected behavior?
>
> --
> Regards,
> Tao
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125