You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Ivan Kelly <iv...@apache.org> on 2018/08/23 10:09:44 UTC

OrderedScheduler & OrderedExecutor in bookkeeper client

Hi folks,

We currently create an OrderedExecutor and an OrderedScheduler in the
client. An OrderedScheduler is an OrderedExecutor. Moreover, it's very
seldom used (basically for polling LAC, speculative reads and explicit
flush.

I propose that we fold these into one. i.e. construct an
OrderedScheduler, constructed with the same parameters as we build the
OrderedExecutor now, and use this OrderedScheduler for all cases
rather than the OrderedExecutor.

(this is the first in a bunch of changes I'll be proposing to clean up
the client thread model).

-Ivan

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

Posted by Ivan Kelly <iv...@apache.org>.
> In the case of OrderedExecutor, it needs a BlockingQueue and the current
> default is to use JDK LinkedBlockingQueue which relies on CAS for
> enqueue/dequeue. Additional room for improvement here is to use a more
> specialized MP-SC queue with different wait strategies.

+1 to this, though something to keep in mind here is that these aren't
BK's cores we're running on, they belong to the client application.

> Overall, I think we should still keep separated the critical data path from
> the rest of operations that are not necessarily time critical.

Keeping the scheduling threads and the executor threads separate means
that we need monitors, which can mean potential blocking, and other
threads invalidating the cpu cache of thread could potentially be kept
very warm.

>> What I want to achieve here is that, after a operation is submitted to
> a ledger, it only ever operates on a single thread.
>
> I think a viable option would be to keep the scheduler separate and then
> always jump in the ledger thread. eg:
>
> scheduler.schedule(() -> {
>     executor.executeOrdered(ledgerId, SafeRunnable.safeRun(() -> {
>         // do something
>     });
> }, 100, TimeUnit.MILLISECONDS);
>

This is basically what I was suggesting, except that we hide this
behind the OrderedScheduler interface. Passing in both the scheduler
and the executor allows the user to run stuff on another thread
unwittingly.

-Ivan

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

Posted by Matteo Merli <ma...@gmail.com>.
The reason for the performance difference between the OrderedExecutor and
the OrderedScheduler is
the underlying queue that is used to pass tasks to the executor thread.

In the case of OrderedExecutor, it needs a BlockingQueue and the current
default is to use JDK LinkedBlockingQueue which relies on CAS for
enqueue/dequeue. Additional room for improvement here is to use a more
specialized MP-SC queue with different wait strategies.

For OrderedScheduler, it needs a priority queue, since tasks can be either
immediate or scheduler for future execution. With priority queue, a mutex
is typically necessary and will introduce contention when there are
multiple threads passing tasks to the scheduler.

Overall, I think we should still keep separated the critical data path from
the rest of operations that are not necessarily time critical.

> What I want to achieve here is that, after a operation is submitted to
a ledger, it only ever operates on a single thread.

I think a viable option would be to keep the scheduler separate and then
always jump in the ledger thread. eg:

scheduler.schedule(() -> {
    executor.executeOrdered(ledgerId, SafeRunnable.safeRun(() -> {
        // do something
    });
}, 100, TimeUnit.MILLISECONDS);

If the scheduler is not always used, the thread could also be lazily
created.

Matteo

On Thu, Aug 23, 2018 at 8:48 AM Ivan Kelly <iv...@apache.org> wrote:

> > I don't think it is accidently. OrderedExecutor has performance
> advantages
> > than OrderedScheduler.
> >
> > A bit background on this:
> >
> > - OrderedScheduler was introduced by me. And I changed existing
> > OrderedSafeExecutor to be extending from OrderedScheduler.
> >   Trying to standarize to one `OrderedScheduler`.
> > - I think Matteo noticed performance difference between `executor` and
> > `scheduler`, so he made the change
> >
> >
> https://github.com/apache/bookkeeper/commit/46171e67e526702487641438144f28b7eb1aa07b
> > .
> > So the `executor` is used as the main callback executor, since it just
> > requires ordering but doesn't need scheduling capability.
> > the `scheduler` is used for scheduling tasks but doesn't require
> ordering.
>
> The scheduler does need ordering in one case for explicit Lac.
>
> I think we could modify the scheduler, so that we could use the same
> scheduler object for the executor and for scheduling.
>
> Instead of having multiple executors in the scheduler, just create one
> scheduled executor, which then submits to the executor service after
> the delay.
>
> What I want to achieve here is that, after a operation is submitted to
> a ledger, it only ever operates on a single thread.
> If you look at LedgerHandle now, you have to jump around 4 files to
> deduce which thread methods like handleBookieFailure or
> sendAddSuccessCallbacks are called on, and even then you can't even be
> sure, so we wrap everything in synchronized when we don't really need
> to.
>
> -Ivan
>
-- 
Matteo Merli
<mm...@apache.org>

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

Posted by Ivan Kelly <iv...@apache.org>.
> I don't think it is accidently. OrderedExecutor has performance advantages
> than OrderedScheduler.
>
> A bit background on this:
>
> - OrderedScheduler was introduced by me. And I changed existing
> OrderedSafeExecutor to be extending from OrderedScheduler.
>   Trying to standarize to one `OrderedScheduler`.
> - I think Matteo noticed performance difference between `executor` and
> `scheduler`, so he made the change
>
> https://github.com/apache/bookkeeper/commit/46171e67e526702487641438144f28b7eb1aa07b
> .
> So the `executor` is used as the main callback executor, since it just
> requires ordering but doesn't need scheduling capability.
> the `scheduler` is used for scheduling tasks but doesn't require ordering.

The scheduler does need ordering in one case for explicit Lac.

I think we could modify the scheduler, so that we could use the same
scheduler object for the executor and for scheduling.

Instead of having multiple executors in the scheduler, just create one
scheduled executor, which then submits to the executor service after
the delay.

What I want to achieve here is that, after a operation is submitted to
a ledger, it only ever operates on a single thread.
If you look at LedgerHandle now, you have to jump around 4 files to
deduce which thread methods like handleBookieFailure or
sendAddSuccessCallbacks are called on, and even then you can't even be
sure, so we wrap everything in synchronized when we don't really need
to.

-Ivan

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

Posted by Sijie Guo <gu...@gmail.com>.
On Thu, Aug 23, 2018 at 6:35 AM Ivan Kelly <iv...@apache.org> wrote:

> >> We currently create an OrderedExecutor and an OrderedScheduler in the
> >> client. An OrderedScheduler is an OrderedExecutor. Moreover, it's very
> >> seldom used (basically for polling LAC, speculative reads and explicit
> >> flush.
> >
> > Why do they exist? Are they only legacy from past or is there any
> specific
> > reason?
>
> Accident I think.


I don't think it is accidently. OrderedExecutor has performance advantages
than OrderedScheduler.

A bit background on this:

- OrderedScheduler was introduced by me. And I changed existing
OrderedSafeExecutor to be extending from OrderedScheduler.
  Trying to standarize to one `OrderedScheduler`.
- I think Matteo noticed performance difference between `executor` and
`scheduler`, so he made the change

https://github.com/apache/bookkeeper/commit/46171e67e526702487641438144f28b7eb1aa07b
.

So the `executor` is used as the main callback executor, since it just
requires ordering but doesn't need scheduling capability.
the `scheduler` is used for scheduling tasks but doesn't require ordering.


> Previously, OrderedExecutor didn't implement
> scheduling, and BookKeeper had a ScheduledExecutorService member.
> This changed with
>
> https://github.com/apache/bookkeeper/commit/46171e67e526702487641438144f28b7eb1aa07b
> .
>
> > Is there any per ledger ordering requirement that we will break (or cause
> > deadlock) if we have a single queue per ledger?
>
> Any ordering that depends on this is broken. No operations should
> block on either of these executors, and I don't think any do.
>
> -Ivan
>

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

Posted by Ivan Kelly <iv...@apache.org>.
>> We currently create an OrderedExecutor and an OrderedScheduler in the
>> client. An OrderedScheduler is an OrderedExecutor. Moreover, it's very
>> seldom used (basically for polling LAC, speculative reads and explicit
>> flush.
>
> Why do they exist? Are they only legacy from past or is there any specific
> reason?

Accident I think. Previously, OrderedExecutor didn't implement
scheduling, and BookKeeper had a ScheduledExecutorService member.
This changed with
https://github.com/apache/bookkeeper/commit/46171e67e526702487641438144f28b7eb1aa07b.

> Is there any per ledger ordering requirement that we will break (or cause
> deadlock) if we have a single queue per ledger?

Any ordering that depends on this is broken. No operations should
block on either of these executors, and I don't think any do.

-Ivan

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
Il gio 23 ago 2018, 12:09 Ivan Kelly <iv...@apache.org> ha scritto:

> Hi folks,
>
> We currently create an OrderedExecutor and an OrderedScheduler in the
> client. An OrderedScheduler is an OrderedExecutor. Moreover, it's very
> seldom used (basically for polling LAC, speculative reads and explicit
> flush.
>

Why do they exist? Are they only legacy from past or is there any specific
reason?

Is there any per ledger ordering requirement that we will break (or cause
deadlock) if we have a single queue per ledger?

These are my only two concerns.

Overall the idea makes sense to me and I will support it

Enrico


> I propose that we fold these into one. i.e. construct an
> OrderedScheduler, constructed with the same parameters as we build the
> OrderedExecutor now, and use this OrderedScheduler for all cases
> rather than the OrderedExecutor.
>
> (this is the first in a bunch of changes I'll be proposing to clean up
> the client thread model).
>
> -Ivan
>
-- 


-- Enrico Olivelli