You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Sijie Guo <gu...@gmail.com> on 2012/12/12 06:47:59 UTC

Question about AckSet.

Hello guys,

AckSet is introduced recently to resolve slow bookie issue. But the
semantic of 'writeQuorumSize' is not so clearly. What kind of goal we
should achieve for 'writeQuorumSize':

a) an entry is written to all 'writeQuorumSize' bookies eventually
b) just guarantee issuing 'writeQuorumSize' bookies and at least
'ackQuorumSize' bookies acked. other bookies we don't care about it.

semantic a):

if we want a), currently it doesn't work. for example, writing ensemble (A,
B, C, D), writeQuorumSize is 3, ackQuorumSize is 2 and C is a slow bookie.

1. client adding entries 0. and entry 0 is acked.
2. at time T, C is timeout (due to it is slow of failed). so the adding
entry 0  at C would failed with BookieHandleNotAvailable. It would try to
pickup a new bookie and unsetSuccessAndSentWriteRequest to new bookie. But
nothing would be executed since pendingAddOps queue is empty. so nothing
would be added in the new bookie.

*    private void unsetSuccessAndSendWriteRequest(final int bookieIndex) {*
*        for (PendingAddOp pendingAddOp : pendingAddOps) {*
*            pendingAddOp.unsetSuccessAndSendWriteRequest(bookieIndex);*
*        }*
*    }*

If we want to achieve semantic a), it would be difficult to handle client
failure case.

semantic b):

if we just want achieve b), so do we need to pickup a new bookie replace
the slow bookie? Also even a new bookie is replaced, no entry would be
added again to it, since pendingAddOps is empty.

I raised this question, is because if we pickup a new bookie replace the
slow bookie, it might cause closing ledger due to
NotEnoughBookiesException. It is easy to produce the case using existing
test case. Please see the attached file.

if we allow b), so maybe we don't need handleBookieFailure when a late
response arrived at the client after its entry has been acked.

-Sijie

Re: Question about AckSet.

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
Reading a bit more into your first message, Sijie, I think you're focusing on this modification:

>> if we allow b), so maybe we don't need handleBookieFailure when a late
> response arrived at the client after its entry has been acked.

is it right? If I understand this part right, you're proposing that we postpone executing handleBookieFailure until we actually need to write another entry to the bookie. If we do so, then we don't need to bring another bookie in when there are no more entries to write and all entries so far have been successfully written. It sounds like it would be ok to do it your way, but I'm still not seeing a major benefit. It is not clear how frequently we have the case that a bookie is detected slow or dead, there are no more entries to add, and all previous entries have been successfully added.

-Flavio 

On Dec 12, 2012, at 8:35 AM, Flavio Junqueira wrote:

> When we replace a bookie, it won't do any work for previous entries, but it will do for future entries. The case in which no more entries are added once a bookie is replaced is a special case. I'm not sure if this is what you're referring to.
> 
> In general, ops should remove slow bookies from the system if they are consistently slow. This quorum mechanism we are talking about is useful when bookies are temporarily slow or until we have enough time to detect and remove faulty bookies. 
> 
> I can see that there could be some updates we can save in some special cases, but I'm not convinced that it would be huge savings as you're implying. I believe we are really talking about exceptional cases here, not regular cases. 
> 
> -Flavio
> 
> On Dec 12, 2012, at 7:48 AM, Sijie Guo wrote:
> 
>>> On the other hand, this is what the application asked us to do, so bk
>> should fulfill its commitment.
>> 
>> The key point here is client replaced the slow bookie with a brand new
>> bookie but not adding any entry to the new bookie for those ack'ed entries.
>> it already doesn't fulfill the commitment, so why do we need to this step,
>> which would introduce one metadata update. for some systems, it would be a
>> headache if there are lots of ledgers on a slow bookie.
>> 
>> Even worse, for a slow bookie, it might actually store that entry but just
>> respond slowly. so the entry is eventually stored at that bookie
>> contributing to the write quorum. But client replaced it with a brand new
>> bookie without adding any entries, which reduce the number of replicas of
>> that entry and increment the work of auto-replication, doesn't it?
>> 
>> from my perspective, if we could reduce an unnecessary metadata access for
>> a ledger, we could save a lot of traffic to metadata storage for those
>> applications using huge number of ledgers. Especially it would be worse if
>> the metadata store is not optimized for writes.
>> 
>> -Sijie
>> 
>> 
>> On Tue, Dec 11, 2012 at 10:19 PM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>> 
>>> In my understanding, the semantics we provide is closer to your option b),
>>> but not quite the way you put it. The contract is that upon an
>>> acknowledgement to a request to add an entry, we guarantee that we have at
>>> least ackQuorumSize copies stored in bookies. We send copies to more
>>> bookies (writeQuorumSize) to avoid slow bookies as you say.
>>> 
>>> It is correct that if there aren't enough bookies we will throw an
>>> exception, and you imply that it is perhaps unnecessary, since the contract
>>> is not to store on writeQuorumSize bookies anyway. On the other hand, this
>>> is what the application asked us to do, so bk should fulfill its
>>> commitment. In the case there aren't enough bookies and the application
>>> gets an exception, it closes the ledger and create a new one with fewer
>>> bookies.
>>> 
>>> -Flavio
>>> 
>>> On Dec 12, 2012, at 6:47 AM, Sijie Guo wrote:
>>> 
>>>> Hello guys,
>>>> 
>>>> AckSet is introduced recently to resolve slow bookie issue. But the
>>> semantic of 'writeQuorumSize' is not so clearly. What kind of goal we
>>> should achieve for 'writeQuorumSize':
>>>> 
>>>> a) an entry is written to all 'writeQuorumSize' bookies eventually
>>>> b) just guarantee issuing 'writeQuorumSize' bookies and at least
>>> 'ackQuorumSize' bookies acked. other bookies we don't care about it.
>>>> 
>>>> semantic a):
>>>> 
>>>> if we want a), currently it doesn't work. for example, writing ensemble
>>> (A, B, C, D), writeQuorumSize is 3, ackQuorumSize is 2 and C is a slow
>>> bookie.
>>>> 
>>>> 1. client adding entries 0. and entry 0 is acked.
>>>> 2. at time T, C is timeout (due to it is slow of failed). so the adding
>>> entry 0  at C would failed with BookieHandleNotAvailable. It would try to
>>> pickup a new bookie and unsetSuccessAndSentWriteRequest to new bookie. But
>>> nothing would be executed since pendingAddOps queue is empty. so nothing
>>> would be added in the new bookie.
>>>> 
>>>>   private void unsetSuccessAndSendWriteRequest(final int bookieIndex) {
>>>>       for (PendingAddOp pendingAddOp : pendingAddOps) {
>>>>           pendingAddOp.unsetSuccessAndSendWriteRequest(bookieIndex);
>>>>       }
>>>>   }
>>>> 
>>>> If we want to achieve semantic a), it would be difficult to handle
>>> client failure case.
>>>> 
>>>> semantic b):
>>>> 
>>>> if we just want achieve b), so do we need to pickup a new bookie replace
>>> the slow bookie? Also even a new bookie is replaced, no entry would be
>>> added again to it, since pendingAddOps is empty.
>>>> 
>>>> I raised this question, is because if we pickup a new bookie replace the
>>> slow bookie, it might cause closing ledger due to
>>> NotEnoughBookiesException. It is easy to produce the case using existing
>>> test case. Please see the attached file.
>>>> 
>>>> if we allow b), so maybe we don't need handleBookieFailure when a late
>>> response arrived at the client after its entry has been acked.
>>>> 
>>>> -Sijie
>>> 
>>> 
> 


Re: Question about AckSet.

Posted by Sijie Guo <gu...@gmail.com>.
Either a) or b) is OK for me. I just think we need to clarify something for
4.2.0 release, right?


On Thu, Dec 13, 2012 at 12:37 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:

> Guaranteeing that eventually everyone in the write quorum will receive it
> (a) implies that we can't complete the operation until all of them ack,
> although we might end up notifying the client before the operation
> completes. Is it what you'd like to have?
>
> -Flavio
>
> On Dec 13, 2012, at 7:47 AM, Sijie Guo wrote:
>
> >> If you avoid replacing it though, then you are vulnerable to a another bookie
> in the ensemble slowing down.
>
> thinking a bit more about it. it is OK for me.
>
> >> Currently we guarentee b) but a) wouldn't be hard to do. We just have
> to avoid removing PendingAddOps until the ackSet is complete.
>
> Could we add some comments about that, to clarify the situation?
>
>
>
> On Wed, Dec 12, 2012 at 2:11 AM, Ivan Kelly <iv...@apache.org> wrote:
>
>> Currently we guarentee b) but a) wouldn't be hard to do. We just have
>> to avoid removing PendingAddOps until the ackSet is complete.
>>
>> >
>> > In general, I think avoiding replacing the slow bookie doesn't volatile
>> the
>> > contract provided by BookKeeper.
>> If you avoid replacing it though, then you are vulnerable to a another
>> bookie in the ensemble slowing down.
>>
>> -Ivan
>>
>
>
>

Re: Question about AckSet.

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
Guaranteeing that eventually everyone in the write quorum will receive it (a) implies that we can't complete the operation until all of them ack, although we might end up notifying the client before the operation completes. Is it what you'd like to have?

-Flavio 

On Dec 13, 2012, at 7:47 AM, Sijie Guo wrote:

> >> If you avoid replacing it though, then you are vulnerable to a another bookie in the ensemble slowing down.
> 
> thinking a bit more about it. it is OK for me.
> 
> >> Currently we guarentee b) but a) wouldn't be hard to do. We just have
> to avoid removing PendingAddOps until the ackSet is complete.
> 
> Could we add some comments about that, to clarify the situation?
> 
> 
> 
> On Wed, Dec 12, 2012 at 2:11 AM, Ivan Kelly <iv...@apache.org> wrote:
> Currently we guarentee b) but a) wouldn't be hard to do. We just have
> to avoid removing PendingAddOps until the ackSet is complete.
> 
> >
> > In general, I think avoiding replacing the slow bookie doesn't volatile the
> > contract provided by BookKeeper.
> If you avoid replacing it though, then you are vulnerable to a another
> bookie in the ensemble slowing down.
> 
> -Ivan
> 


Re: Question about AckSet.

Posted by Sijie Guo <gu...@gmail.com>.
>> Could we add some comments about that, to clarify the situation?

I meant comments in javadoc or documentation.


On Wed, Dec 12, 2012 at 10:47 PM, Sijie Guo <gu...@gmail.com> wrote:

> >> If you avoid replacing it though, then you are vulnerable to a another bookie
> in the ensemble slowing down.
>
> thinking a bit more about it. it is OK for me.
>
> >> Currently we guarentee b) but a) wouldn't be hard to do. We just have
> to avoid removing PendingAddOps until the ackSet is complete.
>
> Could we add some comments about that, to clarify the situation?
>
>
>
> On Wed, Dec 12, 2012 at 2:11 AM, Ivan Kelly <iv...@apache.org> wrote:
>
>> Currently we guarentee b) but a) wouldn't be hard to do. We just have
>> to avoid removing PendingAddOps until the ackSet is complete.
>>
>> >
>> > In general, I think avoiding replacing the slow bookie doesn't volatile
>> the
>> > contract provided by BookKeeper.
>> If you avoid replacing it though, then you are vulnerable to a another
>> bookie in the ensemble slowing down.
>>
>> -Ivan
>>
>
>

Re: Question about AckSet.

Posted by Sijie Guo <gu...@gmail.com>.
>> If you avoid replacing it though, then you are vulnerable to a another bookie
in the ensemble slowing down.

thinking a bit more about it. it is OK for me.

>> Currently we guarentee b) but a) wouldn't be hard to do. We just have
to avoid removing PendingAddOps until the ackSet is complete.

Could we add some comments about that, to clarify the situation?



On Wed, Dec 12, 2012 at 2:11 AM, Ivan Kelly <iv...@apache.org> wrote:

> Currently we guarentee b) but a) wouldn't be hard to do. We just have
> to avoid removing PendingAddOps until the ackSet is complete.
>
> >
> > In general, I think avoiding replacing the slow bookie doesn't volatile
> the
> > contract provided by BookKeeper.
> If you avoid replacing it though, then you are vulnerable to a another
> bookie in the ensemble slowing down.
>
> -Ivan
>

Re: Question about AckSet.

Posted by Ivan Kelly <iv...@apache.org>.
Currently we guarentee b) but a) wouldn't be hard to do. We just have
to avoid removing PendingAddOps until the ackSet is complete.

> 
> In general, I think avoiding replacing the slow bookie doesn't volatile the
> contract provided by BookKeeper.
If you avoid replacing it though, then you are vulnerable to a another
bookie in the ensemble slowing down.

-Ivan

Re: Question about AckSet.

Posted by Sijie Guo <gu...@gmail.com>.
> The case in which no more entries are added once a bookie is replaced is
a special case

This is the case I referred. Example:

Ensemble (A, B, C, D), write quorum size is 3, ack quorum size is 2. 0 ~ N
entries are added. But C is slow. so timeout of entry 0 happened. A new
bookie E is kicked in to replace C. so the ensemble would be changed 0=>(A,
B, E, D). so auto-replication would treated 0 ~ N is underreplicated.
Auto-replication would be scheduled for it, although C might eventually
already have 0 ~ N entries.

> bookies are temporarily slow
> In general, ops should remove slow bookies from the system if they are
consistently slow.

That depends on how fast ops could be notified with such slow bookie. If
when ops noticed a slow bookie, did it already make effects to the system
especially the metadata store and is the time too late?

Also please note, when a slow bookie happened, it might not just cause few
ledgers, which might cause those ledgers who are adding entries to that
slow bookie. So I think we need Jiannan and Fangmin consider this case,
since their case would have lots of ledgers adding entries not a case that
few ledgers adding lots of entries.

> I can see that there could be some updates we can save in some special
cases, but I'm not convinced that it would be huge savings as you're
implying.

It depends on the number of ledgers writing to a slow bookie at the time
interval before it was back to normal or removed. In normal cases, I don't
have any concern. But when scale (the number of ledgers) goes up, I would
keep the concern.

In general, I think avoiding replacing the slow bookie doesn't volatile the
contract provided by BookKeeper.


On Tue, Dec 11, 2012 at 11:35 PM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:

> When we replace a bookie, it won't do any work for previous entries, but
> it will do for future entries. The case in which no more entries are added
> once a bookie is replaced is a special case. I'm not sure if this is what
> you're referring to.
>
> In general, ops should remove slow bookies from the system if they are
> consistently slow. This quorum mechanism we are talking about is useful
> when bookies are temporarily slow or until we have enough time to detect
> and remove faulty bookies.
>
> I can see that there could be some updates we can save in some special
> cases, but I'm not convinced that it would be huge savings as you're
> implying. I believe we are really talking about exceptional cases here, not
> regular cases.
>
> -Flavio
>
> On Dec 12, 2012, at 7:48 AM, Sijie Guo wrote:
>
> >> On the other hand, this is what the application asked us to do, so bk
> > should fulfill its commitment.
> >
> > The key point here is client replaced the slow bookie with a brand new
> > bookie but not adding any entry to the new bookie for those ack'ed
> entries.
> > it already doesn't fulfill the commitment, so why do we need to this
> step,
> > which would introduce one metadata update. for some systems, it would be
> a
> > headache if there are lots of ledgers on a slow bookie.
> >
> > Even worse, for a slow bookie, it might actually store that entry but
> just
> > respond slowly. so the entry is eventually stored at that bookie
> > contributing to the write quorum. But client replaced it with a brand new
> > bookie without adding any entries, which reduce the number of replicas of
> > that entry and increment the work of auto-replication, doesn't it?
> >
> > from my perspective, if we could reduce an unnecessary metadata access
> for
> > a ledger, we could save a lot of traffic to metadata storage for those
> > applications using huge number of ledgers. Especially it would be worse
> if
> > the metadata store is not optimized for writes.
> >
> > -Sijie
> >
> >
> > On Tue, Dec 11, 2012 at 10:19 PM, Flavio Junqueira <fpj@yahoo-inc.com
> >wrote:
> >
> >> In my understanding, the semantics we provide is closer to your option
> b),
> >> but not quite the way you put it. The contract is that upon an
> >> acknowledgement to a request to add an entry, we guarantee that we have
> at
> >> least ackQuorumSize copies stored in bookies. We send copies to more
> >> bookies (writeQuorumSize) to avoid slow bookies as you say.
> >>
> >> It is correct that if there aren't enough bookies we will throw an
> >> exception, and you imply that it is perhaps unnecessary, since the
> contract
> >> is not to store on writeQuorumSize bookies anyway. On the other hand,
> this
> >> is what the application asked us to do, so bk should fulfill its
> >> commitment. In the case there aren't enough bookies and the application
> >> gets an exception, it closes the ledger and create a new one with fewer
> >> bookies.
> >>
> >> -Flavio
> >>
> >> On Dec 12, 2012, at 6:47 AM, Sijie Guo wrote:
> >>
> >>> Hello guys,
> >>>
> >>> AckSet is introduced recently to resolve slow bookie issue. But the
> >> semantic of 'writeQuorumSize' is not so clearly. What kind of goal we
> >> should achieve for 'writeQuorumSize':
> >>>
> >>> a) an entry is written to all 'writeQuorumSize' bookies eventually
> >>> b) just guarantee issuing 'writeQuorumSize' bookies and at least
> >> 'ackQuorumSize' bookies acked. other bookies we don't care about it.
> >>>
> >>> semantic a):
> >>>
> >>> if we want a), currently it doesn't work. for example, writing ensemble
> >> (A, B, C, D), writeQuorumSize is 3, ackQuorumSize is 2 and C is a slow
> >> bookie.
> >>>
> >>> 1. client adding entries 0. and entry 0 is acked.
> >>> 2. at time T, C is timeout (due to it is slow of failed). so the adding
> >> entry 0  at C would failed with BookieHandleNotAvailable. It would try
> to
> >> pickup a new bookie and unsetSuccessAndSentWriteRequest to new bookie.
> But
> >> nothing would be executed since pendingAddOps queue is empty. so nothing
> >> would be added in the new bookie.
> >>>
> >>>    private void unsetSuccessAndSendWriteRequest(final int bookieIndex)
> {
> >>>        for (PendingAddOp pendingAddOp : pendingAddOps) {
> >>>            pendingAddOp.unsetSuccessAndSendWriteRequest(bookieIndex);
> >>>        }
> >>>    }
> >>>
> >>> If we want to achieve semantic a), it would be difficult to handle
> >> client failure case.
> >>>
> >>> semantic b):
> >>>
> >>> if we just want achieve b), so do we need to pickup a new bookie
> replace
> >> the slow bookie? Also even a new bookie is replaced, no entry would be
> >> added again to it, since pendingAddOps is empty.
> >>>
> >>> I raised this question, is because if we pickup a new bookie replace
> the
> >> slow bookie, it might cause closing ledger due to
> >> NotEnoughBookiesException. It is easy to produce the case using existing
> >> test case. Please see the attached file.
> >>>
> >>> if we allow b), so maybe we don't need handleBookieFailure when a late
> >> response arrived at the client after its entry has been acked.
> >>>
> >>> -Sijie
> >>
> >>
>
>

Re: Question about AckSet.

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
When we replace a bookie, it won't do any work for previous entries, but it will do for future entries. The case in which no more entries are added once a bookie is replaced is a special case. I'm not sure if this is what you're referring to.

In general, ops should remove slow bookies from the system if they are consistently slow. This quorum mechanism we are talking about is useful when bookies are temporarily slow or until we have enough time to detect and remove faulty bookies. 

I can see that there could be some updates we can save in some special cases, but I'm not convinced that it would be huge savings as you're implying. I believe we are really talking about exceptional cases here, not regular cases. 

-Flavio

On Dec 12, 2012, at 7:48 AM, Sijie Guo wrote:

>> On the other hand, this is what the application asked us to do, so bk
> should fulfill its commitment.
> 
> The key point here is client replaced the slow bookie with a brand new
> bookie but not adding any entry to the new bookie for those ack'ed entries.
> it already doesn't fulfill the commitment, so why do we need to this step,
> which would introduce one metadata update. for some systems, it would be a
> headache if there are lots of ledgers on a slow bookie.
> 
> Even worse, for a slow bookie, it might actually store that entry but just
> respond slowly. so the entry is eventually stored at that bookie
> contributing to the write quorum. But client replaced it with a brand new
> bookie without adding any entries, which reduce the number of replicas of
> that entry and increment the work of auto-replication, doesn't it?
> 
> from my perspective, if we could reduce an unnecessary metadata access for
> a ledger, we could save a lot of traffic to metadata storage for those
> applications using huge number of ledgers. Especially it would be worse if
> the metadata store is not optimized for writes.
> 
> -Sijie
> 
> 
> On Tue, Dec 11, 2012 at 10:19 PM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
> 
>> In my understanding, the semantics we provide is closer to your option b),
>> but not quite the way you put it. The contract is that upon an
>> acknowledgement to a request to add an entry, we guarantee that we have at
>> least ackQuorumSize copies stored in bookies. We send copies to more
>> bookies (writeQuorumSize) to avoid slow bookies as you say.
>> 
>> It is correct that if there aren't enough bookies we will throw an
>> exception, and you imply that it is perhaps unnecessary, since the contract
>> is not to store on writeQuorumSize bookies anyway. On the other hand, this
>> is what the application asked us to do, so bk should fulfill its
>> commitment. In the case there aren't enough bookies and the application
>> gets an exception, it closes the ledger and create a new one with fewer
>> bookies.
>> 
>> -Flavio
>> 
>> On Dec 12, 2012, at 6:47 AM, Sijie Guo wrote:
>> 
>>> Hello guys,
>>> 
>>> AckSet is introduced recently to resolve slow bookie issue. But the
>> semantic of 'writeQuorumSize' is not so clearly. What kind of goal we
>> should achieve for 'writeQuorumSize':
>>> 
>>> a) an entry is written to all 'writeQuorumSize' bookies eventually
>>> b) just guarantee issuing 'writeQuorumSize' bookies and at least
>> 'ackQuorumSize' bookies acked. other bookies we don't care about it.
>>> 
>>> semantic a):
>>> 
>>> if we want a), currently it doesn't work. for example, writing ensemble
>> (A, B, C, D), writeQuorumSize is 3, ackQuorumSize is 2 and C is a slow
>> bookie.
>>> 
>>> 1. client adding entries 0. and entry 0 is acked.
>>> 2. at time T, C is timeout (due to it is slow of failed). so the adding
>> entry 0  at C would failed with BookieHandleNotAvailable. It would try to
>> pickup a new bookie and unsetSuccessAndSentWriteRequest to new bookie. But
>> nothing would be executed since pendingAddOps queue is empty. so nothing
>> would be added in the new bookie.
>>> 
>>>    private void unsetSuccessAndSendWriteRequest(final int bookieIndex) {
>>>        for (PendingAddOp pendingAddOp : pendingAddOps) {
>>>            pendingAddOp.unsetSuccessAndSendWriteRequest(bookieIndex);
>>>        }
>>>    }
>>> 
>>> If we want to achieve semantic a), it would be difficult to handle
>> client failure case.
>>> 
>>> semantic b):
>>> 
>>> if we just want achieve b), so do we need to pickup a new bookie replace
>> the slow bookie? Also even a new bookie is replaced, no entry would be
>> added again to it, since pendingAddOps is empty.
>>> 
>>> I raised this question, is because if we pickup a new bookie replace the
>> slow bookie, it might cause closing ledger due to
>> NotEnoughBookiesException. It is easy to produce the case using existing
>> test case. Please see the attached file.
>>> 
>>> if we allow b), so maybe we don't need handleBookieFailure when a late
>> response arrived at the client after its entry has been acked.
>>> 
>>> -Sijie
>> 
>> 


Re: Question about AckSet.

Posted by Sijie Guo <gu...@gmail.com>.
> On the other hand, this is what the application asked us to do, so bk
should fulfill its commitment.

The key point here is client replaced the slow bookie with a brand new
bookie but not adding any entry to the new bookie for those ack'ed entries.
it already doesn't fulfill the commitment, so why do we need to this step,
which would introduce one metadata update. for some systems, it would be a
headache if there are lots of ledgers on a slow bookie.

Even worse, for a slow bookie, it might actually store that entry but just
respond slowly. so the entry is eventually stored at that bookie
contributing to the write quorum. But client replaced it with a brand new
bookie without adding any entries, which reduce the number of replicas of
that entry and increment the work of auto-replication, doesn't it?

from my perspective, if we could reduce an unnecessary metadata access for
a ledger, we could save a lot of traffic to metadata storage for those
applications using huge number of ledgers. Especially it would be worse if
the metadata store is not optimized for writes.

-Sijie


On Tue, Dec 11, 2012 at 10:19 PM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:

> In my understanding, the semantics we provide is closer to your option b),
> but not quite the way you put it. The contract is that upon an
> acknowledgement to a request to add an entry, we guarantee that we have at
> least ackQuorumSize copies stored in bookies. We send copies to more
> bookies (writeQuorumSize) to avoid slow bookies as you say.
>
> It is correct that if there aren't enough bookies we will throw an
> exception, and you imply that it is perhaps unnecessary, since the contract
> is not to store on writeQuorumSize bookies anyway. On the other hand, this
> is what the application asked us to do, so bk should fulfill its
> commitment. In the case there aren't enough bookies and the application
> gets an exception, it closes the ledger and create a new one with fewer
> bookies.
>
> -Flavio
>
> On Dec 12, 2012, at 6:47 AM, Sijie Guo wrote:
>
> > Hello guys,
> >
> > AckSet is introduced recently to resolve slow bookie issue. But the
> semantic of 'writeQuorumSize' is not so clearly. What kind of goal we
> should achieve for 'writeQuorumSize':
> >
> > a) an entry is written to all 'writeQuorumSize' bookies eventually
> > b) just guarantee issuing 'writeQuorumSize' bookies and at least
> 'ackQuorumSize' bookies acked. other bookies we don't care about it.
> >
> > semantic a):
> >
> > if we want a), currently it doesn't work. for example, writing ensemble
> (A, B, C, D), writeQuorumSize is 3, ackQuorumSize is 2 and C is a slow
> bookie.
> >
> > 1. client adding entries 0. and entry 0 is acked.
> > 2. at time T, C is timeout (due to it is slow of failed). so the adding
> entry 0  at C would failed with BookieHandleNotAvailable. It would try to
> pickup a new bookie and unsetSuccessAndSentWriteRequest to new bookie. But
> nothing would be executed since pendingAddOps queue is empty. so nothing
> would be added in the new bookie.
> >
> >     private void unsetSuccessAndSendWriteRequest(final int bookieIndex) {
> >         for (PendingAddOp pendingAddOp : pendingAddOps) {
> >             pendingAddOp.unsetSuccessAndSendWriteRequest(bookieIndex);
> >         }
> >     }
> >
> > If we want to achieve semantic a), it would be difficult to handle
> client failure case.
> >
> > semantic b):
> >
> > if we just want achieve b), so do we need to pickup a new bookie replace
> the slow bookie? Also even a new bookie is replaced, no entry would be
> added again to it, since pendingAddOps is empty.
> >
> > I raised this question, is because if we pickup a new bookie replace the
> slow bookie, it might cause closing ledger due to
> NotEnoughBookiesException. It is easy to produce the case using existing
> test case. Please see the attached file.
> >
> > if we allow b), so maybe we don't need handleBookieFailure when a late
> response arrived at the client after its entry has been acked.
> >
> > -Sijie
>
>

Re: Question about AckSet.

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
In my understanding, the semantics we provide is closer to your option b), but not quite the way you put it. The contract is that upon an acknowledgement to a request to add an entry, we guarantee that we have at least ackQuorumSize copies stored in bookies. We send copies to more bookies (writeQuorumSize) to avoid slow bookies as you say. 

It is correct that if there aren't enough bookies we will throw an exception, and you imply that it is perhaps unnecessary, since the contract is not to store on writeQuorumSize bookies anyway. On the other hand, this is what the application asked us to do, so bk should fulfill its commitment. In the case there aren't enough bookies and the application gets an exception, it closes the ledger and create a new one with fewer bookies.

-Flavio

On Dec 12, 2012, at 6:47 AM, Sijie Guo wrote:

> Hello guys,
> 
> AckSet is introduced recently to resolve slow bookie issue. But the semantic of 'writeQuorumSize' is not so clearly. What kind of goal we should achieve for 'writeQuorumSize':
> 
> a) an entry is written to all 'writeQuorumSize' bookies eventually 
> b) just guarantee issuing 'writeQuorumSize' bookies and at least 'ackQuorumSize' bookies acked. other bookies we don't care about it.
> 
> semantic a):
> 
> if we want a), currently it doesn't work. for example, writing ensemble (A, B, C, D), writeQuorumSize is 3, ackQuorumSize is 2 and C is a slow bookie.
> 
> 1. client adding entries 0. and entry 0 is acked.
> 2. at time T, C is timeout (due to it is slow of failed). so the adding entry 0  at C would failed with BookieHandleNotAvailable. It would try to pickup a new bookie and unsetSuccessAndSentWriteRequest to new bookie. But nothing would be executed since pendingAddOps queue is empty. so nothing would be added in the new bookie.
> 
>     private void unsetSuccessAndSendWriteRequest(final int bookieIndex) {
>         for (PendingAddOp pendingAddOp : pendingAddOps) {
>             pendingAddOp.unsetSuccessAndSendWriteRequest(bookieIndex);
>         }
>     }
> 
> If we want to achieve semantic a), it would be difficult to handle client failure case.
> 
> semantic b):
> 
> if we just want achieve b), so do we need to pickup a new bookie replace the slow bookie? Also even a new bookie is replaced, no entry would be added again to it, since pendingAddOps is empty.
> 
> I raised this question, is because if we pickup a new bookie replace the slow bookie, it might cause closing ledger due to NotEnoughBookiesException. It is easy to produce the case using existing test case. Please see the attached file.
> 
> if we allow b), so maybe we don't need handleBookieFailure when a late response arrived at the client after its entry has been acked.
> 
> -Sijie