You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Alexandre Dupriez <al...@gmail.com> on 2021/05/16 11:44:19 UTC

Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

Hi Pieter,

FWIW, you may have encountered the following bug:
https://issues.apache.org/jira/browse/KAFKA-12671 .

Thanks,
Alexandre

Le ven. 12 juin 2020 à 00:43, D C <dr...@gmail.com> a écrit :
>
> Hey peeps,
>
> Anyone else encountered this and got to the bottom of it?
>
> I'm facing a similar issue, having LSO stuck for some partitions in a topic
> and the consumers can't get data out of it (we're using read_committed =
> true).
>
> When this issue started happening we were on kafka 2.3.1
> i tried:
> - restarting the consumers
> - deleting the partition from the leader and letting it get in sync with
> the new leader
> - rolling restart of the brokers
> - shutting down the whole cluster and starting it again
> - tried deleting the txnindex files (after backing them up) and restarting
> the brokers
> - tried putting down the follower brokers of a partition and resyncing that
> partition on them from scratch
> - upgraded both kafka broker and client to 2.5.0
>
> Now the following questions arise:
> Where is the LSO actually stored (even if you get rid of the txnfiles, the
> LSO stays the same).
> Is there any way that the LSO can be reset?
> Is there any way that you can manually abort and clean the state of a stuck
> transaction? (i suspect that this is the reason why the LSO is stuck)
> Is there any way to manually trigger a consistency check on the logfiles
> that would fix any existing issues with either the logs or the indexes in
> the partition?
>
> Cheers,
> Dragos
>
> On 2019/11/20 13:26:54, Pieter Hameete <pi...@blockbax.com> wrote:
> > Hello,
> >
> > after having some Broker issues (too many open files) we managed to recover our Brokers, but read_committed consumers are stuck for a specific topic partition. It seems like the LSO is stuck at a specific offset. The transactional producer for the topic partition is working without errors so the latest offset is incrementing correctly and so is transactional producing.
> >
> > What could be wrong here? And how can we get this specific LSO to be increment again?
> >
> > Thank you in advance for any advice.
> >
> > Best,
> >
> > Pieter
> >