You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ramkrishna.s.vasudevan (Jira)" <ji...@apache.org> on 2020/12/16 05:35:00 UTC
[jira] [Commented] (HBASE-24754) Bulk load performance is degraded
in HBase 2
[ https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250107#comment-17250107 ]
ramkrishna.s.vasudevan commented on HBASE-24754:
------------------------------------------------
[~stack]
What needs to be done from our tests is that, even if we have HBASE-24850 the bulk load performance improves by 15% but does not perform as 1.x or outperform it. Without HBASE-24850 we are around 40 to 45% slower.
The reason is that with HBASE-24850 the inlining takes effect but still we have branching. If you see the PutSortReducer - it handles single row at a time. The reduce() API creates a map adds the data to it and writes it. Even if we have 300 cols we do that row by row. So the optimization that we have don in HBASE-24850 related to inlining does not kick in fully. Instead when we have a KVComparator where there is no branching and also we do inlining we are able to outperform 1.3 performance.
So this brings the fact that we might have to have a pure KVComparator for bulk load types of uses cases. Thoughts?
> Bulk load performance is degraded in HBase 2
> ---------------------------------------------
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
> Issue Type: Bug
> Components: Performance
> Affects Versions: 2.2.3
> Reporter: Ajeet Rai
> Assignee: ramkrishna.s.vasudevan
> Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: Branc2_withComparator_atKeyValue.patch, Branch1.3_putSortReducer_sampleCode.patch, Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
> Test Input:
> 1: Table with 500 region(300 column family)
> 2: data =2 TB
> Data Sample
> 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111
> 3: Cluster: 7 node(2 master+5 Region Server)
> 4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both cluster
>
> |Feature|HBase 2.2.3
> Time(Sec)|HBase 1.3.1
> Time(Sec)|Diff%|Snappy lib:
> |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
> HBase 2.2.3: 1.4
> HBase 1.3.1: 1.4|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)