You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@bookkeeper.apache.org by Maciej Smoleński <je...@gmail.com> on 2015/06/10 13:08:16 UTC

Low write bandwidth

Hello,

I'm testing BK performance when appending 100K entries synchronously from 1
thread (using one ledger).
The performance I get is 250 entries/s.

What performance should I expect ?

My setup:

Ledger:
Ensemble size: 3
Quorum size: 2

1 client machine and 3 server machines.

Network:
Each machine with bonding: 4 x 1000Mbps on each machine
manually tested between client and server: 400MB/s

Disk:
I tested two configurations:
dedicated disks with ext3 (different for zookeeper, journal, data, index,
log)
dedicated ramfs partitions (different for zookeeper, journal, data, index,
log)

In both configurations the performance is the same: 250 entries / s (25MB /
s).
I confirmed this with measured network bandwidth:
- on client 50 MB/s
- on server 17 MB/s

I run java with profiler enabled on BK client and BK server but didn't find
anything unexpected (but I don't know bookkeeper internals).

I tested it with two BookKeeper versions:
- 4.3.0
- 4.2.2
The result were the same with both BookKeeper versions.

What should be changed/checked to get better performance ?

Kind regards,
Maciej

Re: Low write bandwidth

Posted by Flavio Junqueira <fp...@yahoo.com>.
To get like 400MB/s and make it durable, you essentially need to stream out of the network to disk directly. The best ssd drives I'm aware of give you roughly no more than 500MB/s, so what you're asking for is pretty close to the limit of what storage devices can give you today. Also, keep in mind, and Robin may correct me here if my knowledge is oudated at this point, that we don't really stream bytes like that. An entry is treated as a unit, which means that we compute the CRC for a whole entry and process each add entry request as a unit. If you add entries synchronously, it means that you're not properly pipelining requests and consequently not getting the best performance out of the system.
-Flavio    


     On Wednesday, June 10, 2015 3:23 PM, Robin Dhamankar <ro...@apache.org> wrote:
   
 

 Sorry I missed that you also benchmarked this with ramfs. So you don't need the data to be durable, I presume?Can you measure how many TCP packets are being transmitted per entry - we can potentially get some gains by tuning those settingsAre you saying you have only one request outstanding at a time and the previous request has to be acknowledged before the next request can be sent?If that is the case, given that there is a durable write to the journal required before an add is acknowledged by the bookie, there isn't much more room to improve beyond the 250 requests per second you are currently gettingOn Jun 10, 2015 7:00 AM, "Maciej Smoleński" <je...@gmail.com> wrote:

Thank You for Your comment.

Unfortunately, these option will not help in my case.
In my case BookKeeper client will receive next request when previous request is confirmed.
It is expected also that there will be only single stream of such requests.

I would like to understand how to achieve performance equal to the network bandwidth.

  

On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com> wrote:

BK currently isn't wired to stream bytes to a ledger, so writing synchronously large entries as you're doing is likely not to get the best its performance. A couple of things you could try to get higher performance are to write asynchronously and to have multiple clients writing. 
-Flavio
 


     On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <je...@gmail.com> wrote:
   
 

 Hello,

I'm testing BK performance when appending 100K entries synchronously from 1 thread (using one ledger).
The performance I get is 250 entries/s.

What performance should I expect ?

My setup:

Ledger:
Ensemble size: 3
Quorum size: 2

1 client machine and 3 server machines.

Network:
Each machine with bonding: 4 x 1000Mbps on each machine
manually tested between client and server: 400MB/s

Disk:
I tested two configurations:
dedicated disks with ext3 (different for zookeeper, journal, data, index, log)
dedicated ramfs partitions (different for zookeeper, journal, data, index, log)

In both configurations the performance is the same: 250 entries / s (25MB / s).
I confirmed this with measured network bandwidth:
- on client 50 MB/s
- on server 17 MB/s

I run java with profiler enabled on BK client and BK server but didn't find anything unexpected (but I don't know bookkeeper internals).

I tested it with two BookKeeper versions:
- 4.3.0
- 4.2.2
The result were the same with both BookKeeper versions.

What should be changed/checked to get better performance ?

Kind regards,
Maciej





 




 
   




 
  

Re: Low write bandwidth

Posted by Maciej Smoleński <je...@gmail.com>.
I run test and measure some statistics for tcp packets, see below.
The stats below contain extended period - 4 seonds before test and 2 after.

I tested with 3000 request each 100K big (quorum size is 2).
The performance is actually: 310 entries / sec.
Before it was 250 entries / sec before, as there was some unnecessary
logging (log4j config was not the classpath) - detected later with profiler.
it shows: timestamp avgPacketSizeInBytes tcpPacketsNumber
1433952597 52.00 2
1433952598 40.46 0
1433952599 42.21 0
1433952600 46.00 0
1433952601 8238.27 3103
1433952602 8739.31 7799
1433952603 8441.14 8173
1433952604 8820.88 7761
1433952605 8814.23 8232
1433952606 8790.77 8257
1433952607 8602.83 8649
1433952608 8849.24 8156
1433952609 8710.08 9000
1433952610 8809.13 8839
1433952611 8089.91 418
1433952612 46.00 0
1433952613 46.00 0

On Wed, Jun 10, 2015 at 4:23 PM, Robin Dhamankar <ro...@apache.org> wrote:

> Sorry I missed that you also benchmarked this with ramfs. So you don't
> need the data to be durable, I presume?
>
> Can you measure how many TCP packets are being transmitted per entry - we
> can potentially get some gains by tuning those settings
>
> Are you saying you have only one request outstanding at a time and the
> previous request has to be acknowledged before the next request can be sent?
>
> If that is the case, given that there is a durable write to the journal
> required before an add is acknowledged by the bookie, there isn't much more
> room to improve beyond the 250 requests per second you are currently getting
> On Jun 10, 2015 7:00 AM, "Maciej Smoleński" <je...@gmail.com> wrote:
>
>> Thank You for Your comment.
>>
>> Unfortunately, these option will not help in my case.
>> In my case BookKeeper client will receive next request when previous
>> request is confirmed.
>> It is expected also that there will be only single stream of such
>> requests.
>>
>> I would like to understand how to achieve performance equal to the
>> network bandwidth.
>>
>>
>>
>> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
>> wrote:
>>
>>> BK currently isn't wired to stream bytes to a ledger, so writing
>>> synchronously large entries as you're doing is likely not to get the best
>>> its performance. A couple of things you could try to get higher performance
>>> are to write asynchronously and to have multiple clients writing.
>>>
>>> -Flavio
>>>
>>>
>>>
>>>
>>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>>> jezdnia@gmail.com> wrote:
>>>
>>>
>>>
>>> Hello,
>>>
>>> I'm testing BK performance when appending 100K entries synchronously
>>> from 1 thread (using one ledger).
>>> The performance I get is 250 entries/s.
>>>
>>> What performance should I expect ?
>>>
>>> My setup:
>>>
>>> Ledger:
>>> Ensemble size: 3
>>> Quorum size: 2
>>>
>>> 1 client machine and 3 server machines.
>>>
>>> Network:
>>> Each machine with bonding: 4 x 1000Mbps on each machine
>>> manually tested between client and server: 400MB/s
>>>
>>> Disk:
>>> I tested two configurations:
>>> dedicated disks with ext3 (different for zookeeper, journal, data,
>>> index, log)
>>> dedicated ramfs partitions (different for zookeeper, journal, data,
>>> index, log)
>>>
>>> In both configurations the performance is the same: 250 entries / s
>>> (25MB / s).
>>> I confirmed this with measured network bandwidth:
>>> - on client 50 MB/s
>>> - on server 17 MB/s
>>>
>>> I run java with profiler enabled on BK client and BK server but didn't
>>> find anything unexpected (but I don't know bookkeeper internals).
>>>
>>> I tested it with two BookKeeper versions:
>>> - 4.3.0
>>> - 4.2.2
>>> The result were the same with both BookKeeper versions.
>>>
>>> What should be changed/checked to get better performance ?
>>>
>>> Kind regards,
>>> Maciej
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: Low write bandwidth

Posted by Maciej Smoleński <je...@gmail.com>.
I want durability in final product. Testing with ramfs will show what
performance can be achieved with faster storage (ssd, nvram).

I'm not a network expert. I will try to find some command to measure it,
please let me know if You how to get this data.


On Wed, Jun 10, 2015 at 4:23 PM, Robin Dhamankar <ro...@apache.org> wrote:

> Sorry I missed that you also benchmarked this with ramfs. So you don't
> need the data to be durable, I presume?
>
> Can you measure how many TCP packets are being transmitted per entry - we
> can potentially get some gains by tuning those settings
>
> Are you saying you have only one request outstanding at a time and the
> previous request has to be acknowledged before the next request can be sent?
>
> If that is the case, given that there is a durable write to the journal
> required before an add is acknowledged by the bookie, there isn't much more
> room to improve beyond the 250 requests per second you are currently getting
> On Jun 10, 2015 7:00 AM, "Maciej Smoleński" <je...@gmail.com> wrote:
>
>> Thank You for Your comment.
>>
>> Unfortunately, these option will not help in my case.
>> In my case BookKeeper client will receive next request when previous
>> request is confirmed.
>> It is expected also that there will be only single stream of such
>> requests.
>>
>> I would like to understand how to achieve performance equal to the
>> network bandwidth.
>>
>>
>>
>> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
>> wrote:
>>
>>> BK currently isn't wired to stream bytes to a ledger, so writing
>>> synchronously large entries as you're doing is likely not to get the best
>>> its performance. A couple of things you could try to get higher performance
>>> are to write asynchronously and to have multiple clients writing.
>>>
>>> -Flavio
>>>
>>>
>>>
>>>
>>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>>> jezdnia@gmail.com> wrote:
>>>
>>>
>>>
>>> Hello,
>>>
>>> I'm testing BK performance when appending 100K entries synchronously
>>> from 1 thread (using one ledger).
>>> The performance I get is 250 entries/s.
>>>
>>> What performance should I expect ?
>>>
>>> My setup:
>>>
>>> Ledger:
>>> Ensemble size: 3
>>> Quorum size: 2
>>>
>>> 1 client machine and 3 server machines.
>>>
>>> Network:
>>> Each machine with bonding: 4 x 1000Mbps on each machine
>>> manually tested between client and server: 400MB/s
>>>
>>> Disk:
>>> I tested two configurations:
>>> dedicated disks with ext3 (different for zookeeper, journal, data,
>>> index, log)
>>> dedicated ramfs partitions (different for zookeeper, journal, data,
>>> index, log)
>>>
>>> In both configurations the performance is the same: 250 entries / s
>>> (25MB / s).
>>> I confirmed this with measured network bandwidth:
>>> - on client 50 MB/s
>>> - on server 17 MB/s
>>>
>>> I run java with profiler enabled on BK client and BK server but didn't
>>> find anything unexpected (but I don't know bookkeeper internals).
>>>
>>> I tested it with two BookKeeper versions:
>>> - 4.3.0
>>> - 4.2.2
>>> The result were the same with both BookKeeper versions.
>>>
>>> What should be changed/checked to get better performance ?
>>>
>>> Kind regards,
>>> Maciej
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: Low write bandwidth

Posted by Robin Dhamankar <ro...@apache.org>.
Sorry I missed that you also benchmarked this with ramfs. So you don't need
the data to be durable, I presume?

Can you measure how many TCP packets are being transmitted per entry - we
can potentially get some gains by tuning those settings

Are you saying you have only one request outstanding at a time and the
previous request has to be acknowledged before the next request can be sent?

If that is the case, given that there is a durable write to the journal
required before an add is acknowledged by the bookie, there isn't much more
room to improve beyond the 250 requests per second you are currently getting
On Jun 10, 2015 7:00 AM, "Maciej Smoleński" <je...@gmail.com> wrote:

> Thank You for Your comment.
>
> Unfortunately, these option will not help in my case.
> In my case BookKeeper client will receive next request when previous
> request is confirmed.
> It is expected also that there will be only single stream of such requests.
>
> I would like to understand how to achieve performance equal to the network
> bandwidth.
>
>
>
> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
> wrote:
>
>> BK currently isn't wired to stream bytes to a ledger, so writing
>> synchronously large entries as you're doing is likely not to get the best
>> its performance. A couple of things you could try to get higher performance
>> are to write asynchronously and to have multiple clients writing.
>>
>> -Flavio
>>
>>
>>
>>
>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>> jezdnia@gmail.com> wrote:
>>
>>
>>
>> Hello,
>>
>> I'm testing BK performance when appending 100K entries synchronously from
>> 1 thread (using one ledger).
>> The performance I get is 250 entries/s.
>>
>> What performance should I expect ?
>>
>> My setup:
>>
>> Ledger:
>> Ensemble size: 3
>> Quorum size: 2
>>
>> 1 client machine and 3 server machines.
>>
>> Network:
>> Each machine with bonding: 4 x 1000Mbps on each machine
>> manually tested between client and server: 400MB/s
>>
>> Disk:
>> I tested two configurations:
>> dedicated disks with ext3 (different for zookeeper, journal, data, index,
>> log)
>> dedicated ramfs partitions (different for zookeeper, journal, data,
>> index, log)
>>
>> In both configurations the performance is the same: 250 entries / s (25MB
>> / s).
>> I confirmed this with measured network bandwidth:
>> - on client 50 MB/s
>> - on server 17 MB/s
>>
>> I run java with profiler enabled on BK client and BK server but didn't
>> find anything unexpected (but I don't know bookkeeper internals).
>>
>> I tested it with two BookKeeper versions:
>> - 4.3.0
>> - 4.2.2
>> The result were the same with both BookKeeper versions.
>>
>> What should be changed/checked to get better performance ?
>>
>> Kind regards,
>> Maciej
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Low write bandwidth

Posted by Sijie Guo <si...@apache.org>.
Just out of curiosity, does your system only allow one outstanding request?
or just one outstanding request a ledger? without overlapping network
transporting and request processing, it is hard to fully utilize the
resources (either network or disk).

- Sijie

On Wed, Jun 10, 2015 at 7:21 AM, Maciej Smoleński <je...@gmail.com> wrote:

> Yes, I have only one request outstanding at a time.
>
> With 1K request I've got more than 1000 requests / sec.
>
> With 100K request I only get 250 requests / sec.
> Only 1/8 of network bandwidth is used.
> I tested it with physical disks (ext3) and with ramfs and the performance
> was the same - 250 requests / sec.
>
>
>
>
> On Wed, Jun 10, 2015 at 4:06 PM, Robin Dhamankar <
> robin.dhamankar@gmail.com> wrote:
>
>> Are you saying you have only one request outstanding at a time and the
>> previous request has to be acknowledged before the next request can be sent?
>>
>> If that is the case, given that there is a durable write to the journal
>> required before an add is acknowledged by the bookie, there isn't much more
>> room to improve beyond the 250 requests per second you are currently getting
>> On Jun 10, 2015 7:00 AM, "Maciej Smoleński" <je...@gmail.com> wrote:
>>
>>> Thank You for Your comment.
>>>
>>> Unfortunately, these option will not help in my case.
>>> In my case BookKeeper client will receive next request when previous
>>> request is confirmed.
>>> It is expected also that there will be only single stream of such
>>> requests.
>>>
>>> I would like to understand how to achieve performance equal to the
>>> network bandwidth.
>>>
>>>
>>>
>>> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fpjunqueira@yahoo.com
>>> > wrote:
>>>
>>>> BK currently isn't wired to stream bytes to a ledger, so writing
>>>> synchronously large entries as you're doing is likely not to get the best
>>>> its performance. A couple of things you could try to get higher performance
>>>> are to write asynchronously and to have multiple clients writing.
>>>>
>>>> -Flavio
>>>>
>>>>
>>>>
>>>>
>>>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>>>> jezdnia@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I'm testing BK performance when appending 100K entries synchronously
>>>> from 1 thread (using one ledger).
>>>> The performance I get is 250 entries/s.
>>>>
>>>> What performance should I expect ?
>>>>
>>>> My setup:
>>>>
>>>> Ledger:
>>>> Ensemble size: 3
>>>> Quorum size: 2
>>>>
>>>> 1 client machine and 3 server machines.
>>>>
>>>> Network:
>>>> Each machine with bonding: 4 x 1000Mbps on each machine
>>>> manually tested between client and server: 400MB/s
>>>>
>>>> Disk:
>>>> I tested two configurations:
>>>> dedicated disks with ext3 (different for zookeeper, journal, data,
>>>> index, log)
>>>> dedicated ramfs partitions (different for zookeeper, journal, data,
>>>> index, log)
>>>>
>>>> In both configurations the performance is the same: 250 entries / s
>>>> (25MB / s).
>>>> I confirmed this with measured network bandwidth:
>>>> - on client 50 MB/s
>>>> - on server 17 MB/s
>>>>
>>>> I run java with profiler enabled on BK client and BK server but didn't
>>>> find anything unexpected (but I don't know bookkeeper internals).
>>>>
>>>> I tested it with two BookKeeper versions:
>>>> - 4.3.0
>>>> - 4.2.2
>>>> The result were the same with both BookKeeper versions.
>>>>
>>>> What should be changed/checked to get better performance ?
>>>>
>>>> Kind regards,
>>>> Maciej
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: Low write bandwidth

Posted by Maciej Smoleński <je...@gmail.com>.
Yes, I have only one request outstanding at a time.

With 1K request I've got more than 1000 requests / sec.

With 100K request I only get 250 requests / sec.
Only 1/8 of network bandwidth is used.
I tested it with physical disks (ext3) and with ramfs and the performance
was the same - 250 requests / sec.




On Wed, Jun 10, 2015 at 4:06 PM, Robin Dhamankar <ro...@gmail.com>
wrote:

> Are you saying you have only one request outstanding at a time and the
> previous request has to be acknowledged before the next request can be sent?
>
> If that is the case, given that there is a durable write to the journal
> required before an add is acknowledged by the bookie, there isn't much more
> room to improve beyond the 250 requests per second you are currently getting
> On Jun 10, 2015 7:00 AM, "Maciej Smoleński" <je...@gmail.com> wrote:
>
>> Thank You for Your comment.
>>
>> Unfortunately, these option will not help in my case.
>> In my case BookKeeper client will receive next request when previous
>> request is confirmed.
>> It is expected also that there will be only single stream of such
>> requests.
>>
>> I would like to understand how to achieve performance equal to the
>> network bandwidth.
>>
>>
>>
>> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
>> wrote:
>>
>>> BK currently isn't wired to stream bytes to a ledger, so writing
>>> synchronously large entries as you're doing is likely not to get the best
>>> its performance. A couple of things you could try to get higher performance
>>> are to write asynchronously and to have multiple clients writing.
>>>
>>> -Flavio
>>>
>>>
>>>
>>>
>>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>>> jezdnia@gmail.com> wrote:
>>>
>>>
>>>
>>> Hello,
>>>
>>> I'm testing BK performance when appending 100K entries synchronously
>>> from 1 thread (using one ledger).
>>> The performance I get is 250 entries/s.
>>>
>>> What performance should I expect ?
>>>
>>> My setup:
>>>
>>> Ledger:
>>> Ensemble size: 3
>>> Quorum size: 2
>>>
>>> 1 client machine and 3 server machines.
>>>
>>> Network:
>>> Each machine with bonding: 4 x 1000Mbps on each machine
>>> manually tested between client and server: 400MB/s
>>>
>>> Disk:
>>> I tested two configurations:
>>> dedicated disks with ext3 (different for zookeeper, journal, data,
>>> index, log)
>>> dedicated ramfs partitions (different for zookeeper, journal, data,
>>> index, log)
>>>
>>> In both configurations the performance is the same: 250 entries / s
>>> (25MB / s).
>>> I confirmed this with measured network bandwidth:
>>> - on client 50 MB/s
>>> - on server 17 MB/s
>>>
>>> I run java with profiler enabled on BK client and BK server but didn't
>>> find anything unexpected (but I don't know bookkeeper internals).
>>>
>>> I tested it with two BookKeeper versions:
>>> - 4.3.0
>>> - 4.2.2
>>> The result were the same with both BookKeeper versions.
>>>
>>> What should be changed/checked to get better performance ?
>>>
>>> Kind regards,
>>> Maciej
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: Low write bandwidth

Posted by Robin Dhamankar <ro...@gmail.com>.
Are you saying you have only one request outstanding at a time and the
previous request has to be acknowledged before the next request can be sent?

If that is the case, given that there is a durable write to the journal
required before an add is acknowledged by the bookie, there isn't much more
room to improve beyond the 250 requests per second you are currently getting
On Jun 10, 2015 7:00 AM, "Maciej Smoleński" <je...@gmail.com> wrote:

> Thank You for Your comment.
>
> Unfortunately, these option will not help in my case.
> In my case BookKeeper client will receive next request when previous
> request is confirmed.
> It is expected also that there will be only single stream of such requests.
>
> I would like to understand how to achieve performance equal to the network
> bandwidth.
>
>
>
> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
> wrote:
>
>> BK currently isn't wired to stream bytes to a ledger, so writing
>> synchronously large entries as you're doing is likely not to get the best
>> its performance. A couple of things you could try to get higher performance
>> are to write asynchronously and to have multiple clients writing.
>>
>> -Flavio
>>
>>
>>
>>
>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>> jezdnia@gmail.com> wrote:
>>
>>
>>
>> Hello,
>>
>> I'm testing BK performance when appending 100K entries synchronously from
>> 1 thread (using one ledger).
>> The performance I get is 250 entries/s.
>>
>> What performance should I expect ?
>>
>> My setup:
>>
>> Ledger:
>> Ensemble size: 3
>> Quorum size: 2
>>
>> 1 client machine and 3 server machines.
>>
>> Network:
>> Each machine with bonding: 4 x 1000Mbps on each machine
>> manually tested between client and server: 400MB/s
>>
>> Disk:
>> I tested two configurations:
>> dedicated disks with ext3 (different for zookeeper, journal, data, index,
>> log)
>> dedicated ramfs partitions (different for zookeeper, journal, data,
>> index, log)
>>
>> In both configurations the performance is the same: 250 entries / s (25MB
>> / s).
>> I confirmed this with measured network bandwidth:
>> - on client 50 MB/s
>> - on server 17 MB/s
>>
>> I run java with profiler enabled on BK client and BK server but didn't
>> find anything unexpected (but I don't know bookkeeper internals).
>>
>> I tested it with two BookKeeper versions:
>> - 4.3.0
>> - 4.2.2
>> The result were the same with both BookKeeper versions.
>>
>> What should be changed/checked to get better performance ?
>>
>> Kind regards,
>> Maciej
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Low write bandwidth

Posted by Aniruddha Laud <tr...@gmail.com>.
On Wed, Jun 10, 2015 at 8:38 AM, Maciej Smoleński <je...@gmail.com> wrote:

> I run ping -s 65000 and the results are below.
> Latency is always <1.5 ms.
> Does it mean that for transporting single entry two packets will be used
> and the latency will be: 2.5 ms (1.5 ms for (65K) and 1 ms for (35K) => 2.5
> ms for 100K) ?
>
No. ping gives you the round-trip-time, but you can expect the latency to
be a little higher than this number (best way to find this would be to
monitor the actual write latency to the server. Bookkeeper clients expose
these metrics.)

Is it possible to improve this ? Is it possible to increase packet size, so
> that single entry fits single packet ?
>
Can't increase IP packet size. You'll have to reduce the size of each entry
to avoid fragmentation.

>
>
>
> ping/from_client_to_server1
> PING SN0101 (169.254.1.31) 65000(65028) bytes of data.
> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=1 ttl=64 time=1.39 ms
> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=2 ttl=64 time=1.29 ms
> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=3 ttl=64 time=1.29 ms
> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=4 ttl=64 time=1.31 ms
> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=5 ttl=64 time=1.32 ms
>
> ping/from_client_to_server2
> PING SN0102 (169.254.1.32) 65000(65028) bytes of data.
> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=1 ttl=64 time=1.26 ms
> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=2 ttl=64 time=1.31 ms
> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=3 ttl=64 time=1.12 ms
> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=4 ttl=64 time=1.27 ms
> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=5 ttl=64 time=1.37 ms
>
> ping/from_client_to_server3
> PING SN0103 (169.254.1.33) 65000(65028) bytes of data.
> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=1 ttl=64 time=1.25 ms
> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=2 ttl=64 time=1.38 ms
> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=3 ttl=64 time=1.25 ms
> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=4 ttl=64 time=1.33 ms
> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=5 ttl=64 time=1.32 ms
>
> ping/from_server1_to_client
> PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=1.01 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.38 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.35 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=1.35 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.32 ms
>
> ping/from_server2_to_client
> PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=0.887 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.31 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.32 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=0.998 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.22 ms
>
> ping/from_server3_to_client
> PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=1.08 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.40 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.07 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=1.26 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.26 ms
> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=6 ttl=64 time=1.26 ms
>
> On Wed, Jun 10, 2015 at 4:45 PM, Aniruddha Laud <tr...@gmail.com>
> wrote:
>
>>
>>
>> On Wed, Jun 10, 2015 at 7:00 AM, Maciej Smoleński <je...@gmail.com>
>> wrote:
>>
>>> Thank You for Your comment.
>>>
>>> Unfortunately, these option will not help in my case.
>>> In my case BookKeeper client will receive next request when previous
>>> request is confirmed.
>>> It is expected also that there will be only single stream of such
>>> requests.
>>>
>>> I would like to understand how to achieve performance equal to the
>>> network bandwidth.
>>>
>>
>> to saturate bandwidth, you will have to have more than one outstanding
>> request. 250 requests/second gives you 4ms per request. With each entry
>> 100K in size, that's not unreasonable. My suggestion would be to monitor
>> the write latency from the client to the server.
>>
>> ping -s 65000 should give you a baseline for what to expect with
>> latencies.
>>
>> With 100K packets, you are going to see fragmentation at both the IP and
>> the Ethernet layer. That wasn't the case with 1K payload.
>>
>> How many hops does one need to go from one machine to another? - higher
>> the hops, higher the latency
>>
>>
>>>
>>>
>>> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fpjunqueira@yahoo.com
>>> > wrote:
>>>
>>>> BK currently isn't wired to stream bytes to a ledger, so writing
>>>> synchronously large entries as you're doing is likely not to get the best
>>>> its performance. A couple of things you could try to get higher performance
>>>> are to write asynchronously and to have multiple clients writing.
>>>>
>>>> -Flavio
>>>>
>>>>
>>>>
>>>>
>>>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>>>> jezdnia@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I'm testing BK performance when appending 100K entries synchronously
>>>> from 1 thread (using one ledger).
>>>> The performance I get is 250 entries/s.
>>>>
>>>> What performance should I expect ?
>>>>
>>>> My setup:
>>>>
>>>> Ledger:
>>>> Ensemble size: 3
>>>> Quorum size: 2
>>>>
>>>> 1 client machine and 3 server machines.
>>>>
>>>> Network:
>>>> Each machine with bonding: 4 x 1000Mbps on each machine
>>>> manually tested between client and server: 400MB/s
>>>>
>>>> Disk:
>>>> I tested two configurations:
>>>> dedicated disks with ext3 (different for zookeeper, journal, data,
>>>> index, log)
>>>> dedicated ramfs partitions (different for zookeeper, journal, data,
>>>> index, log)
>>>>
>>>> In both configurations the performance is the same: 250 entries / s
>>>> (25MB / s).
>>>> I confirmed this with measured network bandwidth:
>>>> - on client 50 MB/s
>>>> - on server 17 MB/s
>>>>
>>>> I run java with profiler enabled on BK client and BK server but didn't
>>>> find anything unexpected (but I don't know bookkeeper internals).
>>>>
>>>> I tested it with two BookKeeper versions:
>>>> - 4.3.0
>>>> - 4.2.2
>>>> The result were the same with both BookKeeper versions.
>>>>
>>>> What should be changed/checked to get better performance ?
>>>>
>>>> Kind regards,
>>>> Maciej
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Low write bandwidth

Posted by Maciej Smoleński <je...@gmail.com>.
I run ping -s 65000 and the results are below.
Latency is always <1.5 ms.
Does it mean that for transporting single entry two packets will be used
and the latency will be: 2.5 ms (1.5 ms for (65K) and 1 ms for (35K) => 2.5
ms for 100K) ?
Is it possible to improve this ? Is it possible to increase packet size, so
that single entry fits single packet ?



ping/from_client_to_server1
PING SN0101 (169.254.1.31) 65000(65028) bytes of data.
65008 bytes from SN0101 (169.254.1.31): icmp_seq=1 ttl=64 time=1.39 ms
65008 bytes from SN0101 (169.254.1.31): icmp_seq=2 ttl=64 time=1.29 ms
65008 bytes from SN0101 (169.254.1.31): icmp_seq=3 ttl=64 time=1.29 ms
65008 bytes from SN0101 (169.254.1.31): icmp_seq=4 ttl=64 time=1.31 ms
65008 bytes from SN0101 (169.254.1.31): icmp_seq=5 ttl=64 time=1.32 ms

ping/from_client_to_server2
PING SN0102 (169.254.1.32) 65000(65028) bytes of data.
65008 bytes from SN0102 (169.254.1.32): icmp_seq=1 ttl=64 time=1.26 ms
65008 bytes from SN0102 (169.254.1.32): icmp_seq=2 ttl=64 time=1.31 ms
65008 bytes from SN0102 (169.254.1.32): icmp_seq=3 ttl=64 time=1.12 ms
65008 bytes from SN0102 (169.254.1.32): icmp_seq=4 ttl=64 time=1.27 ms
65008 bytes from SN0102 (169.254.1.32): icmp_seq=5 ttl=64 time=1.37 ms

ping/from_client_to_server3
PING SN0103 (169.254.1.33) 65000(65028) bytes of data.
65008 bytes from SN0103 (169.254.1.33): icmp_seq=1 ttl=64 time=1.25 ms
65008 bytes from SN0103 (169.254.1.33): icmp_seq=2 ttl=64 time=1.38 ms
65008 bytes from SN0103 (169.254.1.33): icmp_seq=3 ttl=64 time=1.25 ms
65008 bytes from SN0103 (169.254.1.33): icmp_seq=4 ttl=64 time=1.33 ms
65008 bytes from SN0103 (169.254.1.33): icmp_seq=5 ttl=64 time=1.32 ms

ping/from_server1_to_client
PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=1.01 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.38 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.35 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=1.35 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.32 ms

ping/from_server2_to_client
PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=0.887 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.31 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.32 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=0.998 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.22 ms

ping/from_server3_to_client
PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=1.08 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.40 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.07 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=1.26 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.26 ms
65008 bytes from AN0101 (169.254.1.11): icmp_seq=6 ttl=64 time=1.26 ms

On Wed, Jun 10, 2015 at 4:45 PM, Aniruddha Laud <tr...@gmail.com>
wrote:

>
>
> On Wed, Jun 10, 2015 at 7:00 AM, Maciej Smoleński <je...@gmail.com>
> wrote:
>
>> Thank You for Your comment.
>>
>> Unfortunately, these option will not help in my case.
>> In my case BookKeeper client will receive next request when previous
>> request is confirmed.
>> It is expected also that there will be only single stream of such
>> requests.
>>
>> I would like to understand how to achieve performance equal to the
>> network bandwidth.
>>
>
> to saturate bandwidth, you will have to have more than one outstanding
> request. 250 requests/second gives you 4ms per request. With each entry
> 100K in size, that's not unreasonable. My suggestion would be to monitor
> the write latency from the client to the server.
>
> ping -s 65000 should give you a baseline for what to expect with
> latencies.
>
> With 100K packets, you are going to see fragmentation at both the IP and
> the Ethernet layer. That wasn't the case with 1K payload.
>
> How many hops does one need to go from one machine to another? - higher
> the hops, higher the latency
>
>
>>
>>
>> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
>> wrote:
>>
>>> BK currently isn't wired to stream bytes to a ledger, so writing
>>> synchronously large entries as you're doing is likely not to get the best
>>> its performance. A couple of things you could try to get higher performance
>>> are to write asynchronously and to have multiple clients writing.
>>>
>>> -Flavio
>>>
>>>
>>>
>>>
>>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>>> jezdnia@gmail.com> wrote:
>>>
>>>
>>>
>>> Hello,
>>>
>>> I'm testing BK performance when appending 100K entries synchronously
>>> from 1 thread (using one ledger).
>>> The performance I get is 250 entries/s.
>>>
>>> What performance should I expect ?
>>>
>>> My setup:
>>>
>>> Ledger:
>>> Ensemble size: 3
>>> Quorum size: 2
>>>
>>> 1 client machine and 3 server machines.
>>>
>>> Network:
>>> Each machine with bonding: 4 x 1000Mbps on each machine
>>> manually tested between client and server: 400MB/s
>>>
>>> Disk:
>>> I tested two configurations:
>>> dedicated disks with ext3 (different for zookeeper, journal, data,
>>> index, log)
>>> dedicated ramfs partitions (different for zookeeper, journal, data,
>>> index, log)
>>>
>>> In both configurations the performance is the same: 250 entries / s
>>> (25MB / s).
>>> I confirmed this with measured network bandwidth:
>>> - on client 50 MB/s
>>> - on server 17 MB/s
>>>
>>> I run java with profiler enabled on BK client and BK server but didn't
>>> find anything unexpected (but I don't know bookkeeper internals).
>>>
>>> I tested it with two BookKeeper versions:
>>> - 4.3.0
>>> - 4.2.2
>>> The result were the same with both BookKeeper versions.
>>>
>>> What should be changed/checked to get better performance ?
>>>
>>> Kind regards,
>>> Maciej
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Low write bandwidth

Posted by Aniruddha Laud <tr...@gmail.com>.
On Wed, Jun 10, 2015 at 7:00 AM, Maciej Smoleński <je...@gmail.com> wrote:

> Thank You for Your comment.
>
> Unfortunately, these option will not help in my case.
> In my case BookKeeper client will receive next request when previous
> request is confirmed.
> It is expected also that there will be only single stream of such requests.
>
> I would like to understand how to achieve performance equal to the network
> bandwidth.
>

to saturate bandwidth, you will have to have more than one outstanding
request. 250 requests/second gives you 4ms per request. With each entry
100K in size, that's not unreasonable. My suggestion would be to monitor
the write latency from the client to the server.

ping -s 65000 should give you a baseline for what to expect with latencies.

With 100K packets, you are going to see fragmentation at both the IP and
the Ethernet layer. That wasn't the case with 1K payload.

How many hops does one need to go from one machine to another? - higher the
hops, higher the latency


>
>
> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
> wrote:
>
>> BK currently isn't wired to stream bytes to a ledger, so writing
>> synchronously large entries as you're doing is likely not to get the best
>> its performance. A couple of things you could try to get higher performance
>> are to write asynchronously and to have multiple clients writing.
>>
>> -Flavio
>>
>>
>>
>>
>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>> jezdnia@gmail.com> wrote:
>>
>>
>>
>> Hello,
>>
>> I'm testing BK performance when appending 100K entries synchronously from
>> 1 thread (using one ledger).
>> The performance I get is 250 entries/s.
>>
>> What performance should I expect ?
>>
>> My setup:
>>
>> Ledger:
>> Ensemble size: 3
>> Quorum size: 2
>>
>> 1 client machine and 3 server machines.
>>
>> Network:
>> Each machine with bonding: 4 x 1000Mbps on each machine
>> manually tested between client and server: 400MB/s
>>
>> Disk:
>> I tested two configurations:
>> dedicated disks with ext3 (different for zookeeper, journal, data, index,
>> log)
>> dedicated ramfs partitions (different for zookeeper, journal, data,
>> index, log)
>>
>> In both configurations the performance is the same: 250 entries / s (25MB
>> / s).
>> I confirmed this with measured network bandwidth:
>> - on client 50 MB/s
>> - on server 17 MB/s
>>
>> I run java with profiler enabled on BK client and BK server but didn't
>> find anything unexpected (but I don't know bookkeeper internals).
>>
>> I tested it with two BookKeeper versions:
>> - 4.3.0
>> - 4.2.2
>> The result were the same with both BookKeeper versions.
>>
>> What should be changed/checked to get better performance ?
>>
>> Kind regards,
>> Maciej
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Low write bandwidth

Posted by Maciej Smoleński <je...@gmail.com>.
Thank You for Your comment.

Unfortunately, these option will not help in my case.
In my case BookKeeper client will receive next request when previous
request is confirmed.
It is expected also that there will be only single stream of such requests.

I would like to understand how to achieve performance equal to the network
bandwidth.



On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <fp...@yahoo.com>
wrote:

> BK currently isn't wired to stream bytes to a ledger, so writing
> synchronously large entries as you're doing is likely not to get the best
> its performance. A couple of things you could try to get higher performance
> are to write asynchronously and to have multiple clients writing.
>
> -Flavio
>
>
>
>
>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
> jezdnia@gmail.com> wrote:
>
>
>
> Hello,
>
> I'm testing BK performance when appending 100K entries synchronously from
> 1 thread (using one ledger).
> The performance I get is 250 entries/s.
>
> What performance should I expect ?
>
> My setup:
>
> Ledger:
> Ensemble size: 3
> Quorum size: 2
>
> 1 client machine and 3 server machines.
>
> Network:
> Each machine with bonding: 4 x 1000Mbps on each machine
> manually tested between client and server: 400MB/s
>
> Disk:
> I tested two configurations:
> dedicated disks with ext3 (different for zookeeper, journal, data, index,
> log)
> dedicated ramfs partitions (different for zookeeper, journal, data, index,
> log)
>
> In both configurations the performance is the same: 250 entries / s (25MB
> / s).
> I confirmed this with measured network bandwidth:
> - on client 50 MB/s
> - on server 17 MB/s
>
> I run java with profiler enabled on BK client and BK server but didn't
> find anything unexpected (but I don't know bookkeeper internals).
>
> I tested it with two BookKeeper versions:
> - 4.3.0
> - 4.2.2
> The result were the same with both BookKeeper versions.
>
> What should be changed/checked to get better performance ?
>
> Kind regards,
> Maciej
>
>
>
>
>
>
>
>
>
>
>

Re: Low write bandwidth

Posted by Flavio Junqueira <fp...@yahoo.com>.
BK currently isn't wired to stream bytes to a ledger, so writing synchronously large entries as you're doing is likely not to get the best its performance. A couple of things you could try to get higher performance are to write asynchronously and to have multiple clients writing. 
-Flavio
 


     On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <je...@gmail.com> wrote:
   
 

 Hello,

I'm testing BK performance when appending 100K entries synchronously from 1 thread (using one ledger).
The performance I get is 250 entries/s.

What performance should I expect ?

My setup:

Ledger:
Ensemble size: 3
Quorum size: 2

1 client machine and 3 server machines.

Network:
Each machine with bonding: 4 x 1000Mbps on each machine
manually tested between client and server: 400MB/s

Disk:
I tested two configurations:
dedicated disks with ext3 (different for zookeeper, journal, data, index, log)
dedicated ramfs partitions (different for zookeeper, journal, data, index, log)

In both configurations the performance is the same: 250 entries / s (25MB / s).
I confirmed this with measured network bandwidth:
- on client 50 MB/s
- on server 17 MB/s

I run java with profiler enabled on BK client and BK server but didn't find anything unexpected (but I don't know bookkeeper internals).

I tested it with two BookKeeper versions:
- 4.3.0
- 4.2.2
The result were the same with both BookKeeper versions.

What should be changed/checked to get better performance ?

Kind regards,
Maciej