You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by zhangao <ga...@qq.com.INVALID> on 2021/09/14 11:17:24 UTC

AutoRecovery failed replicate ledger , because, it would read lac from failed bookie

As title, When bookie is lost, the ledger which state is open cannot replicated because of reading lac from failed bookie.
it would failed read lac from failed bookie, because it cannot be connected.

How bookkeeper auto recovery deal with open ledger in failed bookie ?

I don't know if it's a bug or not.

The error log:

12:29:57.072 [main-EventThread] INFO&nbsp; org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve x.x.x.x:3181, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available

12:29:57.072 [main-EventThread] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running

12:29:57.078 [BookKeeperClientWorker-OrderedExecutor-29-0] INFO&nbsp; org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 96789 did not hear success responses from all of ensemble

12:29:57.078 [ReplicationWorker] INFO&nbsp; org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while rereplicating ledger 96789. Enough Bookies might not have available So, no harm to continue

Re: AutoRecovery failed replicate ledger , because, it would read lac from failed bookie

Posted by Jack Vanlightly <jv...@splunk.com.INVALID>.
An LAC read will fail in this way if Ack Quorum or more bookies respond
with any other than OK, NoSuchEntry, NoSuchLedger.

What is your ack quorum? If it is just 1 (not a good setting), then a
single bookie being down will make the LAC read fail this way. If your ack
quorum is 2, then 2 bookies being down will cause it etc.

Jack

On Tue, Sep 14, 2021 at 1:17 PM zhangao <ga...@qq.com.invalid>
wrote:

> [ External sender. Exercise caution. ]
>
> As title, When bookie is lost, the ledger which state is open cannot
> replicated because of reading lac from failed bookie.
> it would failed read lac from failed bookie, because it cannot be
> connected.
>
> How bookkeeper auto recovery deal with open ledger in failed bookie ?
>
> I don't know if it's a bug or not.
>
> The error log:
>
> 12:29:57.072 [main-EventThread] INFO&nbsp;
> org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve
> x.x.x.x:3181, bookie is unknown
> org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException:
> Bookie handle is not available
>
> 12:29:57.072 [main-EventThread] ERROR
> org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
> x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
> org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
> Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not
> running
>
> 12:29:57.078 [BookKeeperClientWorker-OrderedExecutor-29-0] INFO&nbsp;
> org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 96789
> did not hear success responses from all of ensemble
>
> 12:29:57.078 [ReplicationWorker] INFO&nbsp;
> org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while
> rereplicating ledger 96789. Enough Bookies might not have available So, no
> harm to continue