You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Mario Pastorelli <ma...@teralytics.ch> on 2016/12/05 16:37:53 UTC

HDFS vs Accumulo Performance

We are trying to understand Accumulo performance to better plan our future
products that use it and we noticed that the read speed of Accumulo tends
to be way lower than what we would expect. We have a testing cluster with 4
HDFS+Accumulo nodes and we ran some tests. We wrote two programs to write
to HDFS and Accumulo and two programs to read/scan from HDFS and Accumulo
the same number of records containing random bytes. We run all the programs
from outside the cluster, on another node of the rack that doesn’t have
HDFS nor Accumulo.

We also wrote all the HDFS blocks and Accumulo tablets on the same machine
of the cluster.

First of all, we wrote 10M entries to HDFS were each entry was 50 bytes
each. This resulted in 4 blocks on HDFS. Reading this records with a
FSDataInputStream takes around 5.7 seconds with an average speed of around
90MB per second.

Then we wrote 10M entries to HDFS where each entry has a row of 50 random
bytes, no column and no value. Writing is as fast as writing to HDFS modulo
the compaction that we run at the end. The generated table has 1 tablet and
obviously 10M records all on the same cluster. We waited for the compaction
to finish, then we opened a scanner without setting the range and we read
all the records. This time, reading the data took around 20 seconds with
average speed of 25MB/s and 500000 records/s together with ~500 seeks/s. We
have two questions about this result:

1 - is this kind of performance expected?

2 - Is there any configuration that we can change to improve the scan speed?

3 - why there are 500 seeks if there is only one tablet and we read
sequentially all its bytes? What are those seeks doing?

We tried to use a BatchScanner with 1, 5 and 10 threads but the speed was
the same or even worse in some cases.

I can provide the code that we used as well as information about our
cluster configuration if you want.

Thanks,
Mario

-- 
Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682
email: mario.pastorelli@teralytics.ch
www.teralytics.net

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

Re: HDFS vs Accumulo Performance

Posted by Keith Turner <ke...@deenlo.com>.

There is also the set batch size method[1] on the scanner.  I think
that defaults to 1000.   You could try adjusting that.

[1]: http://accumulo.apache.org/1.8/apidocs/org/apache/accumulo/core/client/Scanner.html#setBatchSize%28int%29

On Mon, Dec 5, 2016 at 12:10 PM, Mario Pastorelli
<ma...@teralytics.ch> wrote:
> table.scan.max.memory doesn't affect the number of seeks in our case. We
> tried with 1MB and 2MB.
>
> On Mon, Dec 5, 2016 at 5:59 PM, Josh Elser <jo...@gmail.com> wrote:
>>
>> If you're only ever doing sequential scans, IMO, it's expected that HDFS
>> would be faster. Remember that, architecturally, Accumulo is designed for
>> *random-read/write* workloads. This is where it would shine in comparison to
>> HDFS. Accumulo is always going to have a hit in sequential read/write
>> workloads over HDFS.
>>
>> As to your question about the number of seeks, try playing with the value
>> of "table.scan.max.memory" [1]. You should be able to easily twiddle the
>> value in the Accumulo shell and re-run the test. Accumulo tears down these
>> active scans because it expects that your client would be taking time to
>> process the results it just sent and it would want to not hold onto those in
>> memory (as your client may not come back). Increasing that property will
>> increase the amount of data sent in one RPC which in turn will reduce the
>> number of RPCs and seeks. Aside: I think this server-side "scanner" lifetime
>> is something that'd we want to revisit sooner than later.
>>
>> 25MB/s seems like a pretty reasonable a read rate for one TabletServer
>> (since you only have one tablet). Similarly, why a BatchScanner would have
>> made no difference. BatchScanners parallelize access to multiple Tablets and
>> would have nothing but overhead when you read from a single Tablet.
>>
>> [1]
>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_table_scan_max_memory
>>
>> Mario Pastorelli wrote:
>>>
>>> We are trying to understand Accumulo performance to better plan our
>>> future products that use it and we noticed that the read speed of
>>> Accumulo tends to be way lower than what we would expect. We have a
>>> testing cluster with 4 HDFS+Accumulo nodes and we ran some tests. We
>>> wrote two programs to write to HDFS and Accumulo and two programs to
>>> read/scan from HDFS and Accumulo the same number of records containing
>>> random bytes. We run all the programs from outside the cluster, on
>>> another node of the rack that doesn’t have HDFS nor Accumulo.
>>>
>>> We also wrote all the HDFS blocks and Accumulo tablets on the same
>>> machine of the cluster.
>>>
>>>
>>> First of all, we wrote 10M entries to HDFS were each entry was 50 bytes
>>> each. This resulted in 4 blocks on HDFS. Reading this records with a
>>> FSDataInputStream takes around 5.7 seconds with an average speed of
>>> around 90MB per second.
>>>
>>>
>>> Then we wrote 10M entries to HDFS where each entry has a row of 50
>>> random bytes, no column and no value. Writing is as fast as writing to
>>> HDFS modulo the compaction that we run at the end. The generated table
>>> has 1 tablet and obviously 10M records all on the same cluster. We
>>> waited for the compaction to finish, then we opened a scanner without
>>> setting the range and we read all the records. This time, reading the
>>> data took around 20 seconds with average speed of 25MB/s and 500000
>>> records/s together with ~500 seeks/s. We have two questions about this
>>> result:
>>>
>>>
>>> 1 - is this kind of performance expected?
>>>
>>> 2 - Is there any configuration that we can change to improve the scan
>>> speed?
>>>
>>> 3 - why there are 500 seeks if there is only one tablet and we read
>>> sequentially all its bytes? What are those seeks doing?
>>>
>>>
>>> We tried to use a BatchScanner with 1, 5 and 10 threads but the speed
>>> was the same or even worse in some cases.
>>>
>>>
>>> I can provide the code that we used as well as information about our
>>> cluster configuration if you want.
>>>
>>> Thanks,
>>>
>>> Mario
>>>
>>> --
>>> Mario Pastorelli| TERALYTICS
>>>
>>> *software engineer*
>>>
>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>>> phone:+41794381682
>>> email: mario.pastorelli@teralytics.ch
>>> <ma...@teralytics.ch>
>>> www.teralytics.net <http://www.teralytics.net/>
>>>
>>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>>> Zurich
>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>>> Yann de Vries
>>>
>>> This e-mail message contains confidential information which is for the
>>> sole attention and use of the intended recipient. Please notify us at
>>> once if you think that it may not be intended for you and delete it
>>> immediately.
>>>
>
>
>
> --
> Mario Pastorelli | TERALYTICS
>
> software engineer
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone: +41794381682
> email: mario.pastorelli@teralytics.ch
> www.teralytics.net
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
> de Vries
>
> This e-mail message contains confidential information which is for the sole
> attention and use of the intended recipient. Please notify us at once if you
> think that it may not be intended for you and delete it immediately.

Re: HDFS vs Accumulo Performance

Posted by Mario Pastorelli <ma...@teralytics.ch>.

table.scan.max.memory doesn't affect the number of seeks in our case. We
tried with 1MB and 2MB.

On Mon, Dec 5, 2016 at 5:59 PM, Josh Elser <jo...@gmail.com> wrote:

> If you're only ever doing sequential scans, IMO, it's expected that HDFS
> would be faster. Remember that, architecturally, Accumulo is designed for
> *random-read/write* workloads. This is where it would shine in comparison
> to HDFS. Accumulo is always going to have a hit in sequential read/write
> workloads over HDFS.
>
> As to your question about the number of seeks, try playing with the value
> of "table.scan.max.memory" [1]. You should be able to easily twiddle the
> value in the Accumulo shell and re-run the test. Accumulo tears down these
> active scans because it expects that your client would be taking time to
> process the results it just sent and it would want to not hold onto those
> in memory (as your client may not come back). Increasing that property will
> increase the amount of data sent in one RPC which in turn will reduce the
> number of RPCs and seeks. Aside: I think this server-side "scanner"
> lifetime is something that'd we want to revisit sooner than later.
>
> 25MB/s seems like a pretty reasonable a read rate for one TabletServer
> (since you only have one tablet). Similarly, why a BatchScanner would have
> made no difference. BatchScanners parallelize access to multiple Tablets
> and would have nothing but overhead when you read from a single Tablet.
>
> [1] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_ta
> ble_scan_max_memory
>
> Mario Pastorelli wrote:
>
>> We are trying to understand Accumulo performance to better plan our
>> future products that use it and we noticed that the read speed of
>> Accumulo tends to be way lower than what we would expect. We have a
>> testing cluster with 4 HDFS+Accumulo nodes and we ran some tests. We
>> wrote two programs to write to HDFS and Accumulo and two programs to
>> read/scan from HDFS and Accumulo the same number of records containing
>> random bytes. We run all the programs from outside the cluster, on
>> another node of the rack that doesn’t have HDFS nor Accumulo.
>>
>> We also wrote all the HDFS blocks and Accumulo tablets on the same
>> machine of the cluster.
>>
>>
>> First of all, we wrote 10M entries to HDFS were each entry was 50 bytes
>> each. This resulted in 4 blocks on HDFS. Reading this records with a
>> FSDataInputStream takes around 5.7 seconds with an average speed of
>> around 90MB per second.
>>
>>
>> Then we wrote 10M entries to HDFS where each entry has a row of 50
>> random bytes, no column and no value. Writing is as fast as writing to
>> HDFS modulo the compaction that we run at the end. The generated table
>> has 1 tablet and obviously 10M records all on the same cluster. We
>> waited for the compaction to finish, then we opened a scanner without
>> setting the range and we read all the records. This time, reading the
>> data took around 20 seconds with average speed of 25MB/s and 500000
>> records/s together with ~500 seeks/s. We have two questions about this
>> result:
>>
>>
>> 1 - is this kind of performance expected?
>>
>> 2 - Is there any configuration that we can change to improve the scan
>> speed?
>>
>> 3 - why there are 500 seeks if there is only one tablet and we read
>> sequentially all its bytes? What are those seeks doing?
>>
>>
>> We tried to use a BatchScanner with 1, 5 and 10 threads but the speed
>> was the same or even worse in some cases.
>>
>>
>> I can provide the code that we used as well as information about our
>> cluster configuration if you want.
>>
>> Thanks,
>>
>> Mario
>>
>> --
>> Mario Pastorelli| TERALYTICS
>>
>> *software engineer*
>>
>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>> phone:+41794381682
>> email: mario.pastorelli@teralytics.ch
>> <ma...@teralytics.ch>
>> www.teralytics.net <http://www.teralytics.net/>
>>
>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>> Zurich
>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>> Yann de Vries
>>
>> This e-mail message contains confidential information which is for the
>> sole attention and use of the intended recipient. Please notify us at
>> once if you think that it may not be intended for you and delete it
>> immediately.
>>
>>


-- 
Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682
email: mario.pastorelli@teralytics.ch
www.teralytics.net

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

Re: HDFS vs Accumulo Performance

Posted by Josh Elser <jo...@gmail.com>.

If you're only ever doing sequential scans, IMO, it's expected that HDFS 
would be faster. Remember that, architecturally, Accumulo is designed 
for *random-read/write* workloads. This is where it would shine in 
comparison to HDFS. Accumulo is always going to have a hit in sequential 
read/write workloads over HDFS.

As to your question about the number of seeks, try playing with the 
value of "table.scan.max.memory" [1]. You should be able to easily 
twiddle the value in the Accumulo shell and re-run the test. Accumulo 
tears down these active scans because it expects that your client would 
be taking time to process the results it just sent and it would want to 
not hold onto those in memory (as your client may not come back). 
Increasing that property will increase the amount of data sent in one 
RPC which in turn will reduce the number of RPCs and seeks. Aside: I 
think this server-side "scanner" lifetime is something that'd we want to 
revisit sooner than later.

25MB/s seems like a pretty reasonable a read rate for one TabletServer 
(since you only have one tablet). Similarly, why a BatchScanner would 
have made no difference. BatchScanners parallelize access to multiple 
Tablets and would have nothing but overhead when you read from a single 
Tablet.

[1] 
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_table_scan_max_memory

Mario Pastorelli wrote:
> We are trying to understand Accumulo performance to better plan our
> future products that use it and we noticed that the read speed of
> Accumulo tends to be way lower than what we would expect. We have a
> testing cluster with 4 HDFS+Accumulo nodes and we ran some tests. We
> wrote two programs to write to HDFS and Accumulo and two programs to
> read/scan from HDFS and Accumulo the same number of records containing
> random bytes. We run all the programs from outside the cluster, on
> another node of the rack that doesn\u2019t have HDFS nor Accumulo.
>
> We also wrote all the HDFS blocks and Accumulo tablets on the same
> machine of the cluster.
>
>
> First of all, we wrote 10M entries to HDFS were each entry was 50 bytes
> each. This resulted in 4 blocks on HDFS. Reading this records with a
> FSDataInputStream takes around 5.7 seconds with an average speed of
> around 90MB per second.
>
>
> Then we wrote 10M entries to HDFS where each entry has a row of 50
> random bytes, no column and no value. Writing is as fast as writing to
> HDFS modulo the compaction that we run at the end. The generated table
> has 1 tablet and obviously 10M records all on the same cluster. We
> waited for the compaction to finish, then we opened a scanner without
> setting the range and we read all the records. This time, reading the
> data took around 20 seconds with average speed of 25MB/s and 500000
> records/s together with ~500 seeks/s. We have two questions about this
> result:
>
>
> 1 - is this kind of performance expected?
>
> 2 - Is there any configuration that we can change to improve the scan speed?
>
> 3 - why there are 500 seeks if there is only one tablet and we read
> sequentially all its bytes? What are those seeks doing?
>
>
> We tried to use a BatchScanner with 1, 5 and 10 threads but the speed
> was the same or even worse in some cases.
>
>
> I can provide the code that we used as well as information about our
> cluster configuration if you want.
>
> Thanks,
>
> Mario
>
> --
> Mario Pastorelli| TERALYTICS
>
> *software engineer*
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone:+41794381682
> email: mario.pastorelli@teralytics.ch
> <ma...@teralytics.ch>
> www.teralytics.net <http://www.teralytics.net/>
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
> Yann de Vries
>
> This e-mail message contains confidential information which is for the
> sole attention and use of the intended recipient. Please notify us at
> once if you think that it may not be intended for you and delete it
> immediately.
>

Re: HDFS vs Accumulo Performance

Posted by Keith Turner <ke...@deenlo.com>.

Mario,

Nice test!  One possible reason for the difference is the
de-serialization, iterator, re-serialization cost incurred in the
tserver case.  But I am not sure if this is the cause.  For the HDFS
case blobs of serialized data are read from HDFS and shipped to the
client.  When you run the test what does the CPU utilization on the
tserver and client look like?

The seeks are caused by the fact that scanner fetches batches of data.
It needs to seek for each batch.   This batch size can be adjusted
with table.scan.max.memory[1].

Keith

[1]: http://accumulo.apache.org/1.8/accumulo_user_manual.html#_table_scan_max_memory

On Mon, Dec 5, 2016 at 11:37 AM, Mario Pastorelli
<ma...@teralytics.ch> wrote:
> We are trying to understand Accumulo performance to better plan our future
> products that use it and we noticed that the read speed of Accumulo tends to
> be way lower than what we would expect. We have a testing cluster with 4
> HDFS+Accumulo nodes and we ran some tests. We wrote two programs to write to
> HDFS and Accumulo and two programs to read/scan from HDFS and Accumulo the
> same number of records containing random bytes. We run all the programs from
> outside the cluster, on another node of the rack that doesn’t have HDFS nor
> Accumulo.
>
> We also wrote all the HDFS blocks and Accumulo tablets on the same machine
> of the cluster.
>
>
> First of all, we wrote 10M entries to HDFS were each entry was 50 bytes
> each. This resulted in 4 blocks on HDFS. Reading this records with a
> FSDataInputStream takes around 5.7 seconds with an average speed of around
> 90MB per second.
>
>
> Then we wrote 10M entries to HDFS where each entry has a row of 50 random
> bytes, no column and no value. Writing is as fast as writing to HDFS modulo
> the compaction that we run at the end. The generated table has 1 tablet and
> obviously 10M records all on the same cluster. We waited for the compaction
> to finish, then we opened a scanner without setting the range and we read
> all the records. This time, reading the data took around 20 seconds with
> average speed of 25MB/s and 500000 records/s together with ~500 seeks/s. We
> have two questions about this result:
>
>
> 1 - is this kind of performance expected?
>
> 2 - Is there any configuration that we can change to improve the scan speed?
>
> 3 - why there are 500 seeks if there is only one tablet and we read
> sequentially all its bytes? What are those seeks doing?
>
>
> We tried to use a BatchScanner with 1, 5 and 10 threads but the speed was
> the same or even worse in some cases.
>
>
> I can provide the code that we used as well as information about our cluster
> configuration if you want.
>
> Thanks,
>
> Mario
>
> --
> Mario Pastorelli | TERALYTICS
>
> software engineer
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone: +41794381682
> email: mario.pastorelli@teralytics.ch
> www.teralytics.net
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
> de Vries
>
> This e-mail message contains confidential information which is for the sole
> attention and use of the intended recipient. Please notify us at once if you
> think that it may not be intended for you and delete it immediately.