You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by zhangao <ga...@qq.com.INVALID> on 2021/09/14 11:17:24 UTC
AutoRecovery failed replicate ledger , because, it would read lac from failed bookie
As title, When bookie is lost, the ledger which state is open cannot replicated because of reading lac from failed bookie.
it would failed read lac from failed bookie, because it cannot be connected.
How bookkeeper auto recovery deal with open ledger in failed bookie ?
I don't know if it's a bug or not.
The error log:
12:29:57.072 [main-EventThread] INFO org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve x.x.x.x:3181, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
12:29:57.072 [main-EventThread] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
12:29:57.078 [BookKeeperClientWorker-OrderedExecutor-29-0] INFO org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 96789 did not hear success responses from all of ensemble
12:29:57.078 [ReplicationWorker] INFO org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while rereplicating ledger 96789. Enough Bookies might not have available So, no harm to continue
Re: AutoRecovery failed replicate ledger , because, it would read lac
from failed bookie
Posted by Jack Vanlightly <jv...@splunk.com.INVALID>.
An LAC read will fail in this way if Ack Quorum or more bookies respond
with any other than OK, NoSuchEntry, NoSuchLedger.
What is your ack quorum? If it is just 1 (not a good setting), then a
single bookie being down will make the LAC read fail this way. If your ack
quorum is 2, then 2 bookies being down will cause it etc.
Jack
On Tue, Sep 14, 2021 at 1:17 PM zhangao <ga...@qq.com.invalid>
wrote:
> [ External sender. Exercise caution. ]
>
> As title, When bookie is lost, the ledger which state is open cannot
> replicated because of reading lac from failed bookie.
> it would failed read lac from failed bookie, because it cannot be
> connected.
>
> How bookkeeper auto recovery deal with open ledger in failed bookie ?
>
> I don't know if it's a bug or not.
>
> The error log:
>
> 12:29:57.072 [main-EventThread] INFO
> org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve
> x.x.x.x:3181, bookie is unknown
> org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException:
> Bookie handle is not available
>
> 12:29:57.072 [main-EventThread] ERROR
> org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
> x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
> org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
> Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not
> running
>
> 12:29:57.078 [BookKeeperClientWorker-OrderedExecutor-29-0] INFO
> org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 96789
> did not hear success responses from all of ensemble
>
> 12:29:57.078 [ReplicationWorker] INFO
> org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while
> rereplicating ledger 96789. Enough Bookies might not have available So, no
> harm to continue