You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bookkeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2017/08/04 12:27:39 UTC

Digests are computed inside the mailWorkerPool

Hi bookkeepers,
I noticed that digests are computed on the mailWorkerPool during addEntry
I think that it is better to do such computation outside the pool


I have drafted a patch to share my code
https://github.com/apache/bookkeeper/pull/389
the patch covers only LedgerHandleAdv as it is just a proof of concept

opinions ?

-- Enrico

Re: Digests are computed inside the mailWorkerPool

Posted by Enrico Olivelli <eo...@gmail.com>.

Il ven 4 ago 2017, 19:29 Sijie Guo <gu...@gmail.com> ha scritto:

> On Fri, Aug 4, 2017 at 8:50 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > 2017-08-04 17:38 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >
> > > On Aug 4, 2017 5:27 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:
> > >
> > > Hi bookkeepers,
> > > I noticed that digests are computed on the mailWorkerPool during
> addEntry
> > > I think that it is better to do such computation outside the pool
> > >
> > >
> > > I have drafted a patch to share my code
> > > https://github.com/apache/bookkeeper/pull/389
> > > the patch covers only LedgerHandleAdv as it is just a proof of concept
> > >
> > > opinions ?
> > >
> > >
> > > Because there is always a thread need to run this computation, I am not
> > > sure how much performance gain you can get from this?
> > >
> > > Do you have any performance results before making any such changes?
> > >
> >
> > not yet
> >
> > I see your point: as the number of CPUs is limited from an higher point
> of
> > view the global throughput will not change
> >
> > My idea comes from the fact the addEntry is called from applications
> > threads, the mainWorkerPool is used for delivering callbacks, if I keep
> > busy a thread from that pool and it is bounded in size I am limiting the
> > capacity to deliver callbacks to application threads which are waiting
> for
> > acks of adds or reads.
> >
>
> But if you are looking from end-to-end perspective (from you issue the
> request to receiving the callback), there isn't really difference. Because
> the digest computation needs to run in some threads, your throughput and
> latency will still be bounded.
>

This make a lot of sense.
Now I think that the change will not add improvements and maybe it would
alter actual behaviour of applications !
It is better to leave the computation in bk internal threads

Thank you Sijie !
Enrico


> Especially if you think the problem from other services (e.g. pulsar
> brokers, dl proxies), I doubt it will give you any performance improvement.
>
>
> >
> > Another point is that it easy to track them/profile "application"
> threads,
> > the mainWorkerPool is shared and does many kind of operations.
> >
>
> It might be `easy` for a specific application, but not really `easy` for a
> lot of applications. Especially for the applications that writes a lot of
> ledgers.
>
>
> >
> > I will do some tests and share my results (as soon as I will be  back
> from
> > vacation)
> >
>
> Yes. Please show the numbers, and please cover as many use cases as
> possible.
>
>
> >
> > -- Enrico
> >
> >
> >
> > >
> > > Sijie
> > >
> > >
> > > -- Enrico
> > >
> >
>
-- 


-- Enrico Olivelli

Re: Digests are computed inside the mailWorkerPool

Posted by Sijie Guo <gu...@gmail.com>.

On Fri, Aug 4, 2017 at 8:50 AM, Enrico Olivelli <eo...@gmail.com> wrote:

> 2017-08-04 17:38 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
> > On Aug 4, 2017 5:27 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:
> >
> > Hi bookkeepers,
> > I noticed that digests are computed on the mailWorkerPool during addEntry
> > I think that it is better to do such computation outside the pool
> >
> >
> > I have drafted a patch to share my code
> > https://github.com/apache/bookkeeper/pull/389
> > the patch covers only LedgerHandleAdv as it is just a proof of concept
> >
> > opinions ?
> >
> >
> > Because there is always a thread need to run this computation, I am not
> > sure how much performance gain you can get from this?
> >
> > Do you have any performance results before making any such changes?
> >
>
> not yet
>
> I see your point: as the number of CPUs is limited from an higher point of
> view the global throughput will not change
>
> My idea comes from the fact the addEntry is called from applications
> threads, the mainWorkerPool is used for delivering callbacks, if I keep
> busy a thread from that pool and it is bounded in size I am limiting the
> capacity to deliver callbacks to application threads which are waiting for
> acks of adds or reads.
>

But if you are looking from end-to-end perspective (from you issue the
request to receiving the callback), there isn't really difference. Because
the digest computation needs to run in some threads, your throughput and
latency will still be bounded.

Especially if you think the problem from other services (e.g. pulsar
brokers, dl proxies), I doubt it will give you any performance improvement.

>
> Another point is that it easy to track them/profile "application" threads,
> the mainWorkerPool is shared and does many kind of operations.
>

It might be `easy` for a specific application, but not really `easy` for a
lot of applications. Especially for the applications that writes a lot of
ledgers.

>
> I will do some tests and share my results (as soon as I will be  back from
> vacation)
>

Yes. Please show the numbers, and please cover as many use cases as
possible.

>
> -- Enrico
>
>
>
> >
> > Sijie
> >
> >
> > -- Enrico
> >
>

Re: Digests are computed inside the mailWorkerPool

Posted by Enrico Olivelli <eo...@gmail.com>.

2017-08-04 17:38 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> On Aug 4, 2017 5:27 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:
>
> Hi bookkeepers,
> I noticed that digests are computed on the mailWorkerPool during addEntry
> I think that it is better to do such computation outside the pool
>
>
> I have drafted a patch to share my code
> https://github.com/apache/bookkeeper/pull/389
> the patch covers only LedgerHandleAdv as it is just a proof of concept
>
> opinions ?
>
>
> Because there is always a thread need to run this computation, I am not
> sure how much performance gain you can get from this?
>
> Do you have any performance results before making any such changes?
>

not yet

I see your point: as the number of CPUs is limited from an higher point of
view the global throughput will not change

My idea comes from the fact the addEntry is called from applications
threads, the mainWorkerPool is used for delivering callbacks, if I keep
busy a thread from that pool and it is bounded in size I am limiting the
capacity to deliver callbacks to application threads which are waiting for
acks of adds or reads.

Another point is that it easy to track them/profile "application" threads,
the mainWorkerPool is shared and does many kind of operations.

I will do some tests and share my results (as soon as I will be  back from
vacation)

-- Enrico

>
> Sijie
>
>
> -- Enrico
>

Re: Digests are computed inside the mailWorkerPool

Posted by Sijie Guo <gu...@gmail.com>.

On Aug 4, 2017 5:27 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:

Hi bookkeepers,
I noticed that digests are computed on the mailWorkerPool during addEntry
I think that it is better to do such computation outside the pool


I have drafted a patch to share my code
https://github.com/apache/bookkeeper/pull/389
the patch covers only LedgerHandleAdv as it is just a proof of concept

opinions ?


Because there is always a thread need to run this computation, I am not
sure how much performance gain you can get from this?

Do you have any performance results before making any such changes?

Sijie


-- Enrico