You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ramkrishna.s.vasudevan (Jira)" <ji...@apache.org> on 2020/12/16 05:35:00 UTC

[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

    [ https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250107#comment-17250107 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:
------------------------------------------------

[~stack]

What needs to be done from our tests is that, even if we have HBASE-24850 the bulk load performance improves by 15% but does not perform as 1.x or outperform it. Without HBASE-24850 we are around 40 to 45% slower.

The reason is that with HBASE-24850 the inlining takes effect but still we have branching. If you see the PutSortReducer - it handles single row at a time. The reduce() API creates a map adds the data to it and writes it. Even if we have 300 cols we do that row by row. So the optimization that we have don in HBASE-24850 related to inlining does not kick in fully. Instead when we have a KVComparator where there is no branching and also we do inlining we are able to outperform 1.3 performance. 

So this brings the fact that we  might have to have a pure KVComparator for bulk load types of uses cases. Thoughts?

 

> Bulk load performance is degraded in HBase 2 
> ---------------------------------------------
>
>                 Key: HBASE-24754
>                 URL: https://issues.apache.org/jira/browse/HBASE-24754
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 2.2.3
>            Reporter: Ajeet Rai
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0
>
>         Attachments: Branc2_withComparator_atKeyValue.patch, Branch1.3_putSortReducer_sampleCode.patch, Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)