You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Sergey Semenoff <bo...@gmail.com> on 2020/09/16 09:41:35 UTC

Improvment - speed up HBase to 2-3 times

Hi *!

I think everybody who working with the real BigData know – performance is
very important.

Unfortunaly our lovely HBase slower then Cassandra approximately in 2 times
when reading huge amount of data.


For example – this is Cassandra the performance test run from 2 hosts
(client side)

Host1 - Throughput(ops/sec), 231 021

Host2 - Throughput(ops/sec), 224 691



Summary ~450 000.

HBase shows in the same conditions only 210 000.



Maybe this is one of the reason why Cassandra is more popular (see
https://db-engines.com/en/ranking/wide+column+store)

I’ve done an improvment which can make HBase faster up 2-3 times (it
depends of many reasons, and sometimes even faster).

With the improvement HBase speed up to 430 000 ops/sec.

See the picture in attachment.



If you interested to get this improvement in release you can help to
attract some developers attention here -
https://issues.apache.org/jira/browse/HBASE-23887

Put some line there with your opinion and vote if you think it could be
useful for your work.

I believe discussion about this approach can make HBase more useful and
popular.



Thanks for attention)

With the best regards,

Pustota

Re: Improvment - speed up HBase to 2-3 times

Posted by Viraj Jasani <vj...@apache.org>.

I am planning to commit this PR#2934 after 48 hr from now. If anyone would
like to take a look in the meantime, please let me know over this thread or
on Jira and I will wait until the review is complete.

FYI previous reviewes took place on old PR#1257 and over Jira HBASE-23887
itself.

Thanks


On Mon, 8 Feb 2021 at 11:09 PM, Viraj Jasani <vj...@apache.org> wrote:

> Thanks for working through feedback provided over HBASE-23887 Jira and
> creating
> this new PR [1] with new L1 cache: AdaptiveLRU. Really appreciate your
> efforts!!
> I just had high level look today, structure looks great and will spend
> some time
> (day after) tomorrow for detailed review.
>
> Requesting other reviewers to take a look at this nice new PR [1].
> Thanks
>
> 1. https://github.com/apache/hbase/pull/2934
>
> On 2021/01/07 19:03:54, Sergey Semenoff <bo...@gmail.com> wrote:
> > Hello, guys!
> >
> > Sorry for bothering so smallest thing like increasing performance up to 3
> > times)) I am not sure how much time is ok to consider a PR in open source
> > projects, if I too persistent please forgive me.
> >
> > Maybe someone will have time to take a look at some proposals
> improvement:
> > https://github.com/apache/hbase/pull/1257
> >
> > Thanks)
> >
> > ср, 16 сент. 2020 г., 12:41 Sergey Semenoff <bo...@gmail.com>:
> >
> > > Hi *!
> > >
> > > I think everybody who working with the real BigData know – performance
> is
> > > very important.
> > >
> > > Unfortunaly our lovely HBase slower then Cassandra approximately in 2
> > > times when reading huge amount of data.
> > >
> > >
> > > For example – this is Cassandra the performance test run from 2 hosts
> > > (client side)
> > >
> > > Host1 - Throughput(ops/sec), 231 021
> > >
> > > Host2 - Throughput(ops/sec), 224 691
> > >
> > >
> > >
> > > Summary ~450 000.
> > >
> > > HBase shows in the same conditions only 210 000.
> > >
> > >
> > >
> > > Maybe this is one of the reason why Cassandra is more popular (see
> > > https://db-engines.com/en/ranking/wide+column+store)
> > >
> > > I’ve done an improvment which can make HBase faster up 2-3 times (it
> > > depends of many reasons, and sometimes even faster).
> > >
> > > With the improvement HBase speed up to 430 000 ops/sec.
> > >
> > > See the picture in attachment.
> > >
> > >
> > >
> > > If you interested to get this improvement in release you can help to
> > > attract some developers attention here -
> > > https://issues.apache.org/jira/browse/HBASE-23887
> > >
> > > Put some line there with your opinion and vote if you think it could be
> > > useful for your work.
> > >
> > > I believe discussion about this approach can make HBase more useful and
> > > popular.
> > >
> > >
> > >
> > > Thanks for attention)
> > >
> > > With the best regards,
> > >
> > > Pustota
> > >
> > >
> >
>

Re: Improvment - speed up HBase to 2-3 times

Posted by Viraj Jasani <vj...@apache.org>.

Thanks for working through feedback provided over HBASE-23887 Jira and creating
this new PR [1] with new L1 cache: AdaptiveLRU. Really appreciate your efforts!!
I just had high level look today, structure looks great and will spend some time
(day after) tomorrow for detailed review.

Requesting other reviewers to take a look at this nice new PR [1].
Thanks

1. https://github.com/apache/hbase/pull/2934

On 2021/01/07 19:03:54, Sergey Semenoff <bo...@gmail.com> wrote: 
> Hello, guys!
> 
> Sorry for bothering so smallest thing like increasing performance up to 3
> times)) I am not sure how much time is ok to consider a PR in open source
> projects, if I too persistent please forgive me.
> 
> Maybe someone will have time to take a look at some proposals improvement:
> https://github.com/apache/hbase/pull/1257
> 
> Thanks)
> 
> ср, 16 сент. 2020 г., 12:41 Sergey Semenoff <bo...@gmail.com>:
> 
> > Hi *!
> >
> > I think everybody who working with the real BigData know – performance is
> > very important.
> >
> > Unfortunaly our lovely HBase slower then Cassandra approximately in 2
> > times when reading huge amount of data.
> >
> >
> > For example – this is Cassandra the performance test run from 2 hosts
> > (client side)
> >
> > Host1 - Throughput(ops/sec), 231 021
> >
> > Host2 - Throughput(ops/sec), 224 691
> >
> >
> >
> > Summary ~450 000.
> >
> > HBase shows in the same conditions only 210 000.
> >
> >
> >
> > Maybe this is one of the reason why Cassandra is more popular (see
> > https://db-engines.com/en/ranking/wide+column+store)
> >
> > I’ve done an improvment which can make HBase faster up 2-3 times (it
> > depends of many reasons, and sometimes even faster).
> >
> > With the improvement HBase speed up to 430 000 ops/sec.
> >
> > See the picture in attachment.
> >
> >
> >
> > If you interested to get this improvement in release you can help to
> > attract some developers attention here -
> > https://issues.apache.org/jira/browse/HBASE-23887
> >
> > Put some line there with your opinion and vote if you think it could be
> > useful for your work.
> >
> > I believe discussion about this approach can make HBase more useful and
> > popular.
> >
> >
> >
> > Thanks for attention)
> >
> > With the best regards,
> >
> > Pustota
> >
> >
>

Re: Improvment - speed up HBase to 2-3 times

Posted by Sergey Semenoff <bo...@gmail.com>.

Hello, guys!

Sorry for bothering so smallest thing like increasing performance up to 3
times)) I am not sure how much time is ok to consider a PR in open source
projects, if I too persistent please forgive me.

Maybe someone will have time to take a look at some proposals improvement:
https://github.com/apache/hbase/pull/1257

Thanks)

ср, 16 сент. 2020 г., 12:41 Sergey Semenoff <bo...@gmail.com>:

> Hi *!
>
> I think everybody who working with the real BigData know – performance is
> very important.
>
> Unfortunaly our lovely HBase slower then Cassandra approximately in 2
> times when reading huge amount of data.
>
>
> For example – this is Cassandra the performance test run from 2 hosts
> (client side)
>
> Host1 - Throughput(ops/sec), 231 021
>
> Host2 - Throughput(ops/sec), 224 691
>
>
>
> Summary ~450 000.
>
> HBase shows in the same conditions only 210 000.
>
>
>
> Maybe this is one of the reason why Cassandra is more popular (see
> https://db-engines.com/en/ranking/wide+column+store)
>
> I’ve done an improvment which can make HBase faster up 2-3 times (it
> depends of many reasons, and sometimes even faster).
>
> With the improvement HBase speed up to 430 000 ops/sec.
>
> See the picture in attachment.
>
>
>
> If you interested to get this improvement in release you can help to
> attract some developers attention here -
> https://issues.apache.org/jira/browse/HBASE-23887
>
> Put some line there with your opinion and vote if you think it could be
> useful for your work.
>
> I believe discussion about this approach can make HBase more useful and
> popular.
>
>
>
> Thanks for attention)
>
> With the best regards,
>
> Pustota
>
>

Re: Improvment - speed up HBase to 2-3 times

Posted by Sergey Semenoff <bo...@gmail.com>.

Hello, guys!

Sorry for bothering so smallest thing like increasing performance up to 3
times)) I am not sure how much time is ok to consider a PR in open source
projects, if I too persistent please forgive me.

Maybe someone will have time to take a look at some proposals improvement:
https://github.com/apache/hbase/pull/1257

Thanks)

ср, 16 сент. 2020 г., 12:41 Sergey Semenoff <bo...@gmail.com>:

> Hi *!
>
> I think everybody who working with the real BigData know – performance is
> very important.
>
> Unfortunaly our lovely HBase slower then Cassandra approximately in 2
> times when reading huge amount of data.
>
>
> For example – this is Cassandra the performance test run from 2 hosts
> (client side)
>
> Host1 - Throughput(ops/sec), 231 021
>
> Host2 - Throughput(ops/sec), 224 691
>
>
>
> Summary ~450 000.
>
> HBase shows in the same conditions only 210 000.
>
>
>
> Maybe this is one of the reason why Cassandra is more popular (see
> https://db-engines.com/en/ranking/wide+column+store)
>
> I’ve done an improvment which can make HBase faster up 2-3 times (it
> depends of many reasons, and sometimes even faster).
>
> With the improvement HBase speed up to 430 000 ops/sec.
>
> See the picture in attachment.
>
>
>
> If you interested to get this improvement in release you can help to
> attract some developers attention here -
> https://issues.apache.org/jira/browse/HBASE-23887
>
> Put some line there with your opinion and vote if you think it could be
> useful for your work.
>
> I believe discussion about this approach can make HBase more useful and
> popular.
>
>
>
> Thanks for attention)
>
> With the best regards,
>
> Pustota
>
>

Re: Improvment - speed up HBase to 2-3 times

Posted by Sergey Semenoff <bo...@gmail.com>.

I used utility YCSB - there is ops/sec. It means just get some random
record. Full results below:

Host1

[OVERALL], RunTime(ms), 267033

[OVERALL], Throughput(ops/sec), 224691.33028502096

[TOTAL_GCS_PS_Scavenge], Count, 98

[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 2056

[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.7699422917766717

[TOTAL_GCS_PS_MarkSweep], Count, 0

[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0

[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0

[TOTAL_GCs], Count, 98

[TOTAL_GC_TIME], Time(ms), 2056

[TOTAL_GC_TIME_%], Time(%), 0.7699422917766717

[READ], Operations, 60000000

[READ], AverageLatency(us), 876.4223452166667

[READ], MinLatency(us), 151

[READ], MaxLatency(us), 236159

[READ], 95thPercentileLatency(us), 1298

[READ], 99thPercentileLatency(us), 2571

[READ], Return=OK, 60000000



--

Host2

[OVERALL], RunTime(ms), 259716

[OVERALL], Throughput(ops/sec), 231021.5774153306

[TOTAL_GCS_PS_Scavenge], Count, 142

[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 2342

[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.9017542238445071

[TOTAL_GCS_PS_MarkSweep], Count, 0

[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0

[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0

[TOTAL_GCs], Count, 142

[TOTAL_GC_TIME], Time(ms), 2342

[TOTAL_GC_TIME_%], Time(%), 0.9017542238445071

[READ], Operations, 60000000

[READ], AverageLatency(us), 851.8991412666667

[READ], MinLatency(us), 163

[READ], MaxLatency(us), 710655

[READ], 95thPercentileLatency(us), 1208

[READ], 99thPercentileLatency(us), 2163

[READ], Return=OK, 60000000

It was remote clients.
Server side - 4 hosts (E-2698 v4 2.2 GHz / 40 cores)





ср, 16 сент. 2020 г., 13:43 onmstester onmstester
<on...@zoho.com.invalid>:

> Hi,
>
>
>
> Do you mean row/sec by ops/sec? or partition/sec (in cassandra terms), if
> so then how many rows per op or partition? what's your data model and the
> host spec?
>
> Is your client remote or on the host?
>
> Sent using https://www.zoho.com/mail/
>
>
>
>
> ---- On Wed, 16 Sep 2020 14:11:35 +0430 Sergey Semenoff <
> box4semenoff@gmail.com> wrote ----
>
>
> Hi *!
>
> I think everybody who working with the real BigData know – performance is
> very important.
>
> Unfortunaly our lovely HBase slower then Cassandra approximately in 2
> times
> when reading huge amount of data.
>
>
> For example – this is Cassandra the performance test run from 2 hosts
> (client side)
>
> Host1 - Throughput(ops/sec), 231 021
>
> Host2 - Throughput(ops/sec), 224 691
>
>
>
> Summary ~450 000.
>
> HBase shows in the same conditions only 210 000.
>
>
>
> Maybe this is one of the reason why Cassandra is more popular (see
> https://db-engines.com/en/ranking/wide+column+store)
>
> I’ve done an improvment which can make HBase faster up 2-3 times (it
> depends of many reasons, and sometimes even faster).
>
> With the improvement HBase speed up to 430 000 ops/sec.
>
> See the picture in attachment.
>
>
>
> If you interested to get this improvement in release you can help to
> attract some developers attention here -
> https://issues.apache.org/jira/browse/HBASE-23887
>
> Put some line there with your opinion and vote if you think it could be
> useful for your work.
>
> I believe discussion about this approach can make HBase more useful and
> popular.
>
>
>
> Thanks for attention)
>
> With the best regards,
>
> Pustota

Re: Improvment - speed up HBase to 2-3 times

Posted by onmstester onmstester <on...@zoho.com.INVALID>.

Hi,

Do you mean row/sec by ops/sec? or partition/sec (in cassandra terms), if so then how many rows per op or partition? what's your data model and the host spec?

Is your client remote or on the host?

Sent using https://www.zoho.com/mail/

---- On Wed, 16 Sep 2020 14:11:35 +0430 Sergey Semenoff <bo...@gmail.com> wrote ----

Hi *!

I think everybody who working with the real BigData know – performance is
very important.

Unfortunaly our lovely HBase slower then Cassandra approximately in 2 times
when reading huge amount of data.

For example – this is Cassandra the performance test run from 2 hosts
(client side)

Host1 - Throughput(ops/sec), 231 021

Host2 - Throughput(ops/sec), 224 691

Summary ~450 000.

HBase shows in the same conditions only 210 000.

Maybe this is one of the reason why Cassandra is more popular (see
https://db-engines.com/en/ranking/wide+column+store)

I’ve done an improvment which can make HBase faster up 2-3 times (it
depends of many reasons, and sometimes even faster).

With the improvement HBase speed up to 430 000 ops/sec.

See the picture in attachment.

If you interested to get this improvement in release you can help to
attract some developers attention here -
https://issues.apache.org/jira/browse/HBASE-23887

Put some line there with your opinion and vote if you think it could be
useful for your work.

I believe discussion about this approach can make HBase more useful and
popular.

Thanks for attention)

With the best regards,

Pustota