You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2020/06/26 01:48:00 UTC

[jira] [Comment Edited] (HBASE-24637) Filter SKIP hinting regression

    [ https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145964#comment-17145964 ] 

Andrew Kyle Purtell edited comment on HBASE-24637 at 6/26/20, 1:47 AM:
-----------------------------------------------------------------------

If you'd like to play around with the instrumentation, see attached patches, for branch-1 and branch-2.2, respectively. After applying them add this to log4j.properties:

{noformat}
log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE
{noformat}

Here's an example output from a scan of {{hbase:meta}}:

{noformat}
2020-06-24 14:45:13,870 TRACE [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120] 
ipc.CallRunner: callId: 0 service: ClientService methodName: Scan size: 102
connection: 10.55.111.78:54551 deadline: 1593035713822 successful: true
metrics: [ "block_read_keys": 477 "block_read_ns": 3427040
 "block_reads": 13 "block_seek_ns": 1606370 "block_seeks": 169
 "block_unpack_ns": 10256 "block_unpacks": 13 "cells_matched": 165
 "cells_matched__hbase:meta,,1.1588230740__info": 165
 "column_hint_include": 148 "memstore_next": 72
 "memstore_next_ns": 136671 "memstore_seek": 2
 "memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17
 "sqm_hint_include": 74 "sqm_hint_seek_next_col": 74
 "store_next": 276
 "store_next__1c930a35ff8041368a05817adbdcce97": 40
 "store_next__2644194fdf794815abdc940c183dab88": 40
 "store_next__32ce31753fb244668f788fb94ab02dff": 40
 "store_next__61c8423b9d8846c99a61cd2996b5b621": 116
 "store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40
 "store_next_ns": 1891634
 "store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383
 "store_next_ns__2644194fdf794815abdc940c183dab88": 299936
 "store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594
 "store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313
 "store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408
 "store_reseek": 164
 "store_reseek__1c930a35ff8041368a05817adbdcce97": 32
 "store_reseek__2644194fdf794815abdc940c183dab88": 32
 "store_reseek__32ce31753fb244668f788fb94ab02dff": 32
 "store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36
 "store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32
 "store_reseek_ns": 2969978
 "store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489
 "store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115
 "store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642
 "store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188
 "store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544
 "store_seek": 5
 "store_seek__1c930a35ff8041368a05817adbdcce97": 1
 "store_seek__2644194fdf794815abdc940c183dab88": 1
 "store_seek__32ce31753fb244668f788fb94ab02dff": 1
 "store_seek__61c8423b9d8846c99a61cd2996b5b621": 1
 "store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1
 "store_seek_ns": 8862786
 "store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421
 "store_seek_ns__2644194fdf794815abdc940c183dab88": 585899
 "store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605
 "store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072
 "store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789
 "versions_hint_include": 74 "versions_hint_seek_next_col": 74 ]
{noformat}

You can use the attached perl script to aggregate all CallRunner trace logging for a given table. Here's an example from the PE --filterAll case over the 100 column table on HBase 2.2:

{noformat}
COUNT: 72
block_read_ns 16022572976
block_reads 1157963
block_unpack_ns 40930352
block_unpacks 1157963
cached_block_read_ns 191684215
cached_block_reads 1157330
cells_matched 1010000000
cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0 5459454
cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0 39100332
cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0 101404202
cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0 130169002
cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0 124239494
cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0 139113562
cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0 147105288
cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0 323408666
column_hint_include 1000000000
filter_hint_skip 1000000000
reseeks 1163490
seeker_next 1000000000
seeker_next_ns 92121300195
sqm_hint_done 9999992
sqm_hint_seek_next_col 990000008
sqm_hint_seek_next_row 10000000
store_next 998836518
store_next__03573d0d16474c73a315c5d2a1f25986 145479362
store_next__0d570a284b274c7686ebbaacc1786110 128730258
store_next__622d9f19372d478aadb7112dad739106 137576008
store_next__83be76bc01a74b9e9831c743e25b73b8 319835506
store_next__a74e26cbc87741689f71f44dca266c85 5398444
store_next__d43b019ef3b24abea32261fc5af2b59b 100283189
store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173
store_next__ef169c9106cd4695ba066e98e26a5367 38667578
store_next_ns 227944975778
store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224
store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635
store_next_ns__622d9f19372d478aadb7112dad739106 31970878120
store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595
store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596
store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458
store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445
store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705
store_reseek 1163490
store_reseek__03573d0d16474c73a315c5d2a1f25986 169439
store_reseek__0d570a284b274c7686ebbaacc1786110 149943
store_reseek__622d9f19372d478aadb7112dad739106 160193
store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095
store_reseek__a74e26cbc87741689f71f44dca266c85 6957
store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012
store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228
store_reseek__ef169c9106cd4695ba066e98e26a5367 45623
store_reseek_ns 4065247883
store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475
store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515
store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308
store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574
store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452
store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215
store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682
store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662
versions_hint_include_and_seek_next_col 990000000
versions_hint_include_and_seek_next_row 10000000
{noformat}


was (Author: apurtell):
If you'd like to play around with the instrumentation, see attached patches, for branch-1 and branch-2.2, respectively. After applying them add this to log4j.properties:

{noformat}
log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE
{noformat}

Here's an example output from a scan of {{hbase:meta}}:

{noformat}
2020-06-24 14:45:13,870 TRACE [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120] ipc.CallRunner: callId: 0 service: ClientService methodName: Scan size: 102 connection: 10.55.111.78:54551 deadline: 1593035713822 successful: true metrics: [ "block_read_keys": 477 "block_read_ns": 3427040 "block_reads": 13 "block_seek_ns": 1606370 "block_seeks": 169 "block_unpack_ns": 10256 "block_unpacks": 13 "cells_matched": 165 "cells_matched__hbase:meta,,1.1588230740__info": 165 "column_hint_include": 148 "memstore_next": 72 "memstore_next_ns": 136671 "memstore_seek": 2 "memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17 "sqm_hint_include": 74 "sqm_hint_seek_next_col": 74 "store_next": 276 "store_next__1c930a35ff8041368a05817adbdcce97": 40 "store_next__2644194fdf794815abdc940c183dab88": 40 "store_next__32ce31753fb244668f788fb94ab02dff": 40 "store_next__61c8423b9d8846c99a61cd2996b5b621": 116 "store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40 "store_next_ns": 1891634 "store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383 "store_next_ns__2644194fdf794815abdc940c183dab88": 299936 "store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594 "store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313 "store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408 "store_reseek": 164 "store_reseek__1c930a35ff8041368a05817adbdcce97": 32 "store_reseek__2644194fdf794815abdc940c183dab88": 32 "store_reseek__32ce31753fb244668f788fb94ab02dff": 32 "store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36 "store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32 "store_reseek_ns": 2969978 "store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489 "store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115 "store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642 "store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188 "store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544 "store_seek": 5 "store_seek__1c930a35ff8041368a05817adbdcce97": 1 "store_seek__2644194fdf794815abdc940c183dab88": 1 "store_seek__32ce31753fb244668f788fb94ab02dff": 1 "store_seek__61c8423b9d8846c99a61cd2996b5b621": 1 "store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1 "store_seek_ns": 8862786 "store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421 "store_seek_ns__2644194fdf794815abdc940c183dab88": 585899 "store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605 "store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072 "store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789 "versions_hint_include": 74 "versions_hint_seek_next_col": 74 ]
{noformat}

You can use the attached perl script to aggregate all CallRunner trace logging for a given table. Here's an example from the PE --filterAll case over the 100 column table on HBase 2.2:

{noformat}
COUNT: 72
block_read_ns 16022572976
block_reads 1157963
block_unpack_ns 40930352
block_unpacks 1157963
cached_block_read_ns 191684215
cached_block_reads 1157330
cells_matched 1010000000
cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0 5459454
cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0 39100332
cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0 101404202
cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0 130169002
cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0 124239494
cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0 139113562
cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0 147105288
cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0 323408666
column_hint_include 1000000000
filter_hint_skip 1000000000
reseeks 1163490
seeker_next 1000000000
seeker_next_ns 92121300195
sqm_hint_done 9999992
sqm_hint_seek_next_col 990000008
sqm_hint_seek_next_row 10000000
store_next 998836518
store_next__03573d0d16474c73a315c5d2a1f25986 145479362
store_next__0d570a284b274c7686ebbaacc1786110 128730258
store_next__622d9f19372d478aadb7112dad739106 137576008
store_next__83be76bc01a74b9e9831c743e25b73b8 319835506
store_next__a74e26cbc87741689f71f44dca266c85 5398444
store_next__d43b019ef3b24abea32261fc5af2b59b 100283189
store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173
store_next__ef169c9106cd4695ba066e98e26a5367 38667578
store_next_ns 227944975778
store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224
store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635
store_next_ns__622d9f19372d478aadb7112dad739106 31970878120
store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595
store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596
store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458
store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445
store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705
store_reseek 1163490
store_reseek__03573d0d16474c73a315c5d2a1f25986 169439
store_reseek__0d570a284b274c7686ebbaacc1786110 149943
store_reseek__622d9f19372d478aadb7112dad739106 160193
store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095
store_reseek__a74e26cbc87741689f71f44dca266c85 6957
store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012
store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228
store_reseek__ef169c9106cd4695ba066e98e26a5367 45623
store_reseek_ns 4065247883
store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475
store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515
store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308
store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574
store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452
store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215
store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682
store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662
versions_hint_include_and_seek_next_col 990000000
versions_hint_include_and_seek_next_row 10000000
{noformat}

> Filter SKIP hinting regression
> ------------------------------
>
>                 Key: HBASE-24637
>                 URL: https://issues.apache.org/jira/browse/HBASE-24637
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters, Performance, Scanners
>    Affects Versions: 2.2.5
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, W-7665966-Instrument-low-level-scan-details-branch-1.patch, W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative to HBase 1. Depending on the test scenario, HBase 2 can demonstrate significantly better microbenchmarks in a number of cases, and usually shows improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call metrics that leverage the fact it puts a reference to the current Call into a thread local and that all activity for a given RPC is processed by a single thread context. I then instrumented ScanQueryMatcher (in branch-1) and its various friends (in branch-2.2), StoreScanner, HFileReaderV2 and HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 and 2.2 versions under test operated on identical data files in HDFS. For tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the ability of filters to provide meaningful SKIP hints, which disables an optimization that avoids reseeking, leading to a serious and proportional regression in reseek activity and time spent in that code path. So for queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was almost identical, as measured by counts of the hint types returned, whether or not column or version trackers are called, and counts of store seeks or reseeks. Regarding micro-timings, there was a 10% variance in my testing and results generally fell within this range, except for the filter all case of course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)