You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Venkateswara Rao Jujjuri <ju...@gmail.com> on 2019/02/18 20:22:27 UTC
How to handle stale metadata?
Recently we ran into a situation where the LedgerMetadataListener never
returned/detected metadata change. Due to this reader had stale metadata
and tried to read from bookies that no longer have that ledger, hence
NoSuchLedgerExistsException was returned to the caller.
1. I wonder if NoSuchLedgerExistsException is the right error here?
- Client knows that the ledger exists in the metadata. It has valid
handle. So ledger *Exists*.
- In this case it is stale metadata so a restart of client took care of
the situation. But what if the ledger is in ZK, but missing from all
bookies? This can be a durability or availability issue based on the
bookies in the metadata are part of the cluster or not.
- I think we need to have more sophisticated error handling here.
Comments?
2. Having too many watches puts memory pressure on the client.
- How about having an option to re-read the metadata on demand w/o watch?
- Schedule a task to reread metadata on the first bookie failure with
NoSuchEntry/NoSuchLedger.
- If all three bookies fail, wait for the outstanding metadata read
to return before failing to user.
- If the metadata is read, and is different from the local copy,
reattempt the read.
- If metadata is not different, then fail with "some new error"
DataLossException or something?
- This can cause latency if the metadata is changing a lot, but may be
better than constant watches? It could be a configuration option.
- We could even think of having both enabled if the reader is super
conservative.
Thoughts?
JV
--
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi
Re: How to handle stale metadata?
Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
On Mon, Feb 18, 2019 at 5:49 PM Sijie Guo <gu...@gmail.com> wrote:
> On Tue, Feb 19, 2019 at 4:22 AM Venkateswara Rao Jujjuri <
> jujjuri@gmail.com>
> wrote:
>
> > Recently we ran into a situation where the LedgerMetadataListener never
> > returned/detected metadata change. Due to this reader had stale metadata
> > and tried to read from bookies that no longer have that ledger, hence
> > NoSuchLedgerExistsException was returned to the caller.
> >
>
> Do the bookies have entries? Or does it change ensemble to cause an empty
> fragment?
>
Replication worker migrated ledgers on to other bookies. But the client is
pointing to old set of bookies.
Those bookies respond NoSuchLedgerExistsException when talking to the empty
> fragment.
> Is that the case? I am not sure how NoSuchLedgerExistsException can be
> propagated to the client.
>
Since client is contacting old bookies, which doesn't have the ledger
anymore, it is returning NoSuchLedger.
and all 3 bookies returned the same error, hence client returned that to
user.
> Can you describe the sequence on how this happened?
>
>
> >
> > 1. I wonder if NoSuchLedgerExistsException is the right error here?
>
>
> > - Client knows that the ledger exists in the metadata. It has valid
> > handle. So ledger *Exists*.
> > - In this case it is stale metadata so a restart of client took care
> of
> > the situation. But what if the ledger is in ZK, but missing from all
> > bookies? This can be a durability or availability issue based on the
> > bookies in the metadata are part of the cluster or not.
> > - I think we need to have more sophisticated error handling here.
> > Comments?
> >
>
>
>
>
>
> >
> > 2. Having too many watches puts memory pressure on the client.
> >
> > - How about having an option to re-read the metadata on demand w/o
> > watch?
> >
>
> +1 this has been in my todo list for a while. we should provide options to
> do this either by watches or by re-read scheduling or both.
>
>
> > - Schedule a task to reread metadata on the first bookie failure
> with
> > NoSuchEntry/NoSuchLedger.
> > - If all three bookies fail, wait for the outstanding metadata read
> > to return before failing to user.
> > - If the metadata is read, and is different from the local copy,
> > reattempt the read.
> > - If metadata is not different, then fail with "some new error"
> > DataLossException or something?
> > - This can cause latency if the metadata is changing a lot, but may be
> > better than constant watches? It could be a configuration option.
> > - We could even think of having both enabled if the reader is super
> > conservative.
> >
> >
> > Thoughts?
> > JV
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>
--
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi
Re: How to handle stale metadata?
Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno mar 19 feb 2019, 02:49 Sijie Guo <gu...@gmail.com> ha scritto:
> On Tue, Feb 19, 2019 at 4:22 AM Venkateswara Rao Jujjuri <
> jujjuri@gmail.com>
> wrote:
>
> > Recently we ran into a situation where the LedgerMetadataListener never
> > returned/detected metadata change. Due to this reader had stale metadata
> > and tried to read from bookies that no longer have that ledger, hence
> > NoSuchLedgerExistsException was returned to the caller.
> >
>
> Do the bookies have entries? Or does it change ensemble to cause an empty
> fragment?
> Those bookies respond NoSuchLedgerExistsException when talking to the empty
> fragment.
> Is that the case? I am not sure how NoSuchLedgerExistsException can be
> propagated to the client.
>
> Can you describe the sequence on how this happened?
>
>
> >
> > 1. I wonder if NoSuchLedgerExistsException is the right error here?
>
>
> > - Client knows that the ledger exists in the metadata. It has valid
> > handle. So ledger *Exists*.
> > - In this case it is stale metadata so a restart of client took care
> of
> > the situation. But what if the ledger is in ZK, but missing from all
> > bookies? This can be a durability or availability issue based on the
> > bookies in the metadata are part of the cluster or not.
> > - I think we need to have more sophisticated error handling here.
> > Comments?
> >
>
>
>
>
>
> >
> > 2. Having too many watches puts memory pressure on the client.
>
I think Zookeeper improvement will help a lot, but unfortunately that patch
never landed to zookeeper codebase
It is about having a single persistent ans recursive watch which tracks all
the events in am hierarchy of znodes
https://github.com/apache/zookeeper/pull/136
Enrico
>
> > - How about having an option to re-read the metadata on demand w/o
> > watch?
> >
>
> +1 this has been in my todo list for a while. we should provide options to
> do this either by watches or by re-read scheduling or both.
>
>
> > - Schedule a task to reread metadata on the first bookie failure
> with
> > NoSuchEntry/NoSuchLedger.
> > - If all three bookies fail, wait for the outstanding metadata read
> > to return before failing to user.
> > - If the metadata is read, and is different from the local copy,
> > reattempt the read.
> > - If metadata is not different, then fail with "some new error"
> > DataLossException or something?
> > - This can cause latency if the metadata is changing a lot, but may be
> > better than constant watches? It could be a configuration option.
> > - We could even think of having both enabled if the reader is super
> > conservative.
> >
> >
> > Thoughts?
> > JV
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>
Re: How to handle stale metadata?
Posted by Sijie Guo <gu...@gmail.com>.
On Tue, Feb 19, 2019 at 4:22 AM Venkateswara Rao Jujjuri <ju...@gmail.com>
wrote:
> Recently we ran into a situation where the LedgerMetadataListener never
> returned/detected metadata change. Due to this reader had stale metadata
> and tried to read from bookies that no longer have that ledger, hence
> NoSuchLedgerExistsException was returned to the caller.
>
Do the bookies have entries? Or does it change ensemble to cause an empty
fragment?
Those bookies respond NoSuchLedgerExistsException when talking to the empty
fragment.
Is that the case? I am not sure how NoSuchLedgerExistsException can be
propagated to the client.
Can you describe the sequence on how this happened?
>
> 1. I wonder if NoSuchLedgerExistsException is the right error here?
> - Client knows that the ledger exists in the metadata. It has valid
> handle. So ledger *Exists*.
> - In this case it is stale metadata so a restart of client took care of
> the situation. But what if the ledger is in ZK, but missing from all
> bookies? This can be a durability or availability issue based on the
> bookies in the metadata are part of the cluster or not.
> - I think we need to have more sophisticated error handling here.
> Comments?
>
>
> 2. Having too many watches puts memory pressure on the client.
>
> - How about having an option to re-read the metadata on demand w/o
> watch?
>
+1 this has been in my todo list for a while. we should provide options to
do this either by watches or by re-read scheduling or both.
> - Schedule a task to reread metadata on the first bookie failure with
> NoSuchEntry/NoSuchLedger.
> - If all three bookies fail, wait for the outstanding metadata read
> to return before failing to user.
> - If the metadata is read, and is different from the local copy,
> reattempt the read.
> - If metadata is not different, then fail with "some new error"
> DataLossException or something?
> - This can cause latency if the metadata is changing a lot, but may be
> better than constant watches? It could be a configuration option.
> - We could even think of having both enabled if the reader is super
> conservative.
>
>
> Thoughts?
> JV
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>