You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2017/08/02 15:21:32 UTC

Number of ledgers and checkpoint time - bookie does not scale ?

Hi,
I am trying to understand because after running my benchmark (see [1]) the
overall performances slow down incredibly. The bench starts at 80,8 MB/s
throughput and drops to 10,2 MB/s throughput.

I think that the problem could be the I am creating a lot of ledgers on a
single bookie.
It seems that the more active ledgers I have the more time it takes the
bookie to process internal activities, like
IndexInMemPageMgr#flushOneOrMoreLedgers(true).

The slow down starts after the first line in logs as:
org.apache.bookkeeper.bookie.EntryLogger flushRotatedLogs

I suspect that there is a piece of the story that I am missing.

This behavior is the same on BK 4.4 and BK 4.5

Is it possible ?

[1]
https://github.com/eolivelli/bookkeepers-benchs/blob/master/src/test/java/BookKeeperWriteSynchClientsTest.java

Thanks

-- Enrico

Re: Number of ledgers and checkpoint time - bookie does not scale ?

Posted by Enrico Olivelli <eo...@gmail.com>.
2017-08-03 15:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

>
>
> 2017-08-02 19:21 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
>>
>>
>> Il mer 2 ago 2017, 18:51 Sijie Guo <gu...@gmail.com> ha scritto:
>>
>>> On Wed, Aug 2, 2017 at 8:21 AM, Enrico Olivelli <eo...@gmail.com>
>>> wrote:
>>>
>>> > Hi,
>>> > I am trying to understand because after running my benchmark (see [1])
>>> the
>>> > overall performances slow down incredibly. The bench starts at 80,8
>>> MB/s
>>> > throughput and drops to 10,2 MB/s throughput.
>>> >
>>> > I think that the problem could be the I am creating a lot of ledgers
>>> on a
>>> > single bookie.
>>> > It seems that the more active ledgers I have the more time it takes the
>>> > bookie to process internal activities, like
>>> > IndexInMemPageMgr#flushOneOrMoreLedgers(true).
>>>
>>>
>>> What is your disk layout? journal/ledger/index directory settings?
>>>
>>
>> All in one disk, on a tmp dir under target.
>> Hdd vs ssd is the same
>>
>>>
>>> What is your ledger storage? are you using sorted or not?
>>>
>>
>> Sorted
>>
>> Can you run the test? Just clone and run
>> Mvn test -Dtest=
>> BookKeeperWriteSynchClientsTest
>>
>> I am curious to know if the fact happens on other laptops
>>
>
>
> I am comparing two different machines:
> machine 1: Fedora24 + ssd disk + 32 GB RAM + ext4 with mount options
> "defaults,relatime"
> machine 2: Centos6 + ssd disk + 32 GB RAM + ext4 with mount options
> "defaults,relatime"
>
> on machine 1 after 320 created ledgers, there is the 'slow down' situation
> on machine 2 there is never a "slow down" situation
>
> machines are really alike to each other, the real differences are:
> - SO
> - disk speed, on machine 2 the disk is slower and there is a avg
> thoughtput of 61 MB/s, on machine 1 we start from 80 MB/s and drops to 20
> MB/s
>


Other info:

running client on machine 1 and bookie+zookeeper on machine 2 the bench
never slows down
running client on machine 2 and bookie+zookeeper on machine 1 the bench
never slow down as usual

an interesting fact
if I keep the bookie on machine 1 up (never restart), every time I start
the client on machine 2 the throughput is "good" and after some time
performances "drop"

with this new kind of tests I can say that:
- the problem is only when the bookie+zookeeper is on machine1
- the problem shows up after some time that the client is working, even
without restating the bookie

so the overall number of ledger continues to grow and so it is not a
problem of "scalability" in terms on numbers of served ledger per bookie






>
> -- Enrico
>
>
>
>>
>> Enrico
>>
>>>
>>>
>>> >
>>> > The slow down starts after the first line in logs as:
>>> > org.apache.bookkeeper.bookie.EntryLogger flushRotatedLogs
>>>
>>>
>>> > I suspect that there is a piece of the story that I am missing.
>>> >
>>> > This behavior is the same on BK 4.4 and BK 4.5
>>>
>>>
>>> > Is it possible ?
>>>
>>>
>>> > [1]
>>> > https://github.com/eolivelli/bookkeepers-benchs/blob/master/
>>> src/test/java/
>>> > BookKeeperWriteSynchClientsTest.java
>>> >
>>> > Thanks
>>> >
>>> > -- Enrico
>>> >
>>>
>> --
>>
>>
>> -- Enrico Olivelli
>>
>
>

Re: Number of ledgers and checkpoint time - bookie does not scale ?

Posted by Enrico Olivelli <eo...@gmail.com>.
2017-08-02 19:21 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

>
>
> Il mer 2 ago 2017, 18:51 Sijie Guo <gu...@gmail.com> ha scritto:
>
>> On Wed, Aug 2, 2017 at 8:21 AM, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>
>> > Hi,
>> > I am trying to understand because after running my benchmark (see [1])
>> the
>> > overall performances slow down incredibly. The bench starts at 80,8 MB/s
>> > throughput and drops to 10,2 MB/s throughput.
>> >
>> > I think that the problem could be the I am creating a lot of ledgers on
>> a
>> > single bookie.
>> > It seems that the more active ledgers I have the more time it takes the
>> > bookie to process internal activities, like
>> > IndexInMemPageMgr#flushOneOrMoreLedgers(true).
>>
>>
>> What is your disk layout? journal/ledger/index directory settings?
>>
>
> All in one disk, on a tmp dir under target.
> Hdd vs ssd is the same
>
>>
>> What is your ledger storage? are you using sorted or not?
>>
>
> Sorted
>
> Can you run the test? Just clone and run
> Mvn test -Dtest=
> BookKeeperWriteSynchClientsTest
>
> I am curious to know if the fact happens on other laptops
>


I am comparing two different machines:
machine 1: Fedora24 + ssd disk + 32 GB RAM + ext4 with mount options
"defaults,relatime"
machine 2: Centos6 + ssd disk + 32 GB RAM + ext4 with mount options
"defaults,relatime"

on machine 1 after 320 created ledgers, there is the 'slow down' situation
on machine 2 there is never a "slow down" situation

machines are really alike to each other, the real differences are:
- SO
- disk speed, on machine 2 the disk is slower and there is a avg thoughtput
of 61 MB/s, on machine 1 we start from 80 MB/s and drops to 20 MB/s

-- Enrico



>
> Enrico
>
>>
>>
>> >
>> > The slow down starts after the first line in logs as:
>> > org.apache.bookkeeper.bookie.EntryLogger flushRotatedLogs
>>
>>
>> > I suspect that there is a piece of the story that I am missing.
>> >
>> > This behavior is the same on BK 4.4 and BK 4.5
>>
>>
>> > Is it possible ?
>>
>>
>> > [1]
>> > https://github.com/eolivelli/bookkeepers-benchs/blob/
>> master/src/test/java/
>> > BookKeeperWriteSynchClientsTest.java
>> >
>> > Thanks
>> >
>> > -- Enrico
>> >
>>
> --
>
>
> -- Enrico Olivelli
>

Re: Number of ledgers and checkpoint time - bookie does not scale ?

Posted by Enrico Olivelli <eo...@gmail.com>.
Il mer 2 ago 2017, 18:51 Sijie Guo <gu...@gmail.com> ha scritto:

> On Wed, Aug 2, 2017 at 8:21 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Hi,
> > I am trying to understand because after running my benchmark (see [1])
> the
> > overall performances slow down incredibly. The bench starts at 80,8 MB/s
> > throughput and drops to 10,2 MB/s throughput.
> >
> > I think that the problem could be the I am creating a lot of ledgers on a
> > single bookie.
> > It seems that the more active ledgers I have the more time it takes the
> > bookie to process internal activities, like
> > IndexInMemPageMgr#flushOneOrMoreLedgers(true).
>
>
> What is your disk layout? journal/ledger/index directory settings?
>

All in one disk, on a tmp dir under target.
Hdd vs ssd is the same

>
> What is your ledger storage? are you using sorted or not?
>

Sorted

Can you run the test? Just clone and run
Mvn test -Dtest=
BookKeeperWriteSynchClientsTest

I am curious to know if the fact happens on other laptops

Enrico

>
>
> >
> > The slow down starts after the first line in logs as:
> > org.apache.bookkeeper.bookie.EntryLogger flushRotatedLogs
>
>
> > I suspect that there is a piece of the story that I am missing.
> >
> > This behavior is the same on BK 4.4 and BK 4.5
>
>
> > Is it possible ?
>
>
> > [1]
> >
> https://github.com/eolivelli/bookkeepers-benchs/blob/master/src/test/java/
> > BookKeeperWriteSynchClientsTest.java
> >
> > Thanks
> >
> > -- Enrico
> >
>
-- 


-- Enrico Olivelli

Re: Number of ledgers and checkpoint time - bookie does not scale ?

Posted by Sijie Guo <gu...@gmail.com>.
On Wed, Aug 2, 2017 at 8:21 AM, Enrico Olivelli <eo...@gmail.com> wrote:

> Hi,
> I am trying to understand because after running my benchmark (see [1]) the
> overall performances slow down incredibly. The bench starts at 80,8 MB/s
> throughput and drops to 10,2 MB/s throughput.
>
> I think that the problem could be the I am creating a lot of ledgers on a
> single bookie.
> It seems that the more active ledgers I have the more time it takes the
> bookie to process internal activities, like
> IndexInMemPageMgr#flushOneOrMoreLedgers(true).


What is your disk layout? journal/ledger/index directory settings?

What is your ledger storage? are you using sorted or not?


>
> The slow down starts after the first line in logs as:
> org.apache.bookkeeper.bookie.EntryLogger flushRotatedLogs


> I suspect that there is a piece of the story that I am missing.
>
> This behavior is the same on BK 4.4 and BK 4.5


> Is it possible ?


> [1]
> https://github.com/eolivelli/bookkeepers-benchs/blob/master/src/test/java/
> BookKeeperWriteSynchClientsTest.java
>
> Thanks
>
> -- Enrico
>