You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (Jira)" <ji...@apache.org> on 2020/07/16 20:32:00 UTC

[jira] [Comment Edited] (HBASE-24637) Filter SKIP hinting regression

    [ https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159459#comment-17159459 ] 

Lars Hofhansl edited comment on HBASE-24637 at 7/16/20, 8:31 PM:
-----------------------------------------------------------------

Hmm... The SQM giving more precise SEEK hints is not necessarily wrong. It's a hint that a SEEK is *possible*.

With the SKIP vs SEEK optimization I put in place a while ago then decides at the StoreScanner to follow that hint or not. Now, that optimization itself is not free, it adds 1 or 2 extra compares.

In HBASE-24742 I managed to remove one compare in most cases. So it might better now, but it's still not good if we issue too many SEEK hints, for each of which we then have to decide to follow it or not.



was (Author: lhofhansl):
Hmm... The SQM giving more precise SEEK hints is not necessarily wrong. It's a hint that a SEEK is *possible*.

With the SKIP vs SEEK optimization I put in place a while ago then decides at the StoreScanner to follow that hint or not. Now, that itself optimization is not free, it adds one compare per Cell-version + 1 or 2 extra compares (# versions + 1 or 2 in total).

In HBASE-24742 I managed to remove one compare in most cases. So it might better now, but it's still not good if we issue too many SEEK hints, for each of which we then have to decide to follow it or not.


> Filter SKIP hinting regression
> ------------------------------
>
>                 Key: HBASE-24637
>                 URL: https://issues.apache.org/jira/browse/HBASE-24637
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters, Performance, Scanners
>    Affects Versions: 2.2.5
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, W-7665966-Instrument-low-level-scan-details-branch-1.patch, W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative to HBase 1. Depending on the test scenario, HBase 2 can demonstrate significantly better microbenchmarks in a number of cases, and usually shows improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call metrics that leverage the fact it puts a reference to the current Call into a thread local and that all activity for a given RPC is processed by a single thread context. I then instrumented ScanQueryMatcher (in branch-1) and its various friends (in branch-2.2), StoreScanner, HFileReaderV2 and HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 and 2.2 versions under test operated on identical data files in HDFS. For tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the ability of filters to provide meaningful SKIP hints, which disables an optimization that avoids reseeking, leading to a serious and proportional regression in reseek activity and time spent in that code path. So for queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was almost identical, as measured by counts of the hint types returned, whether or not column or version trackers are called, and counts of store seeks or reseeks. Regarding micro-timings, there was a 10% variance in my testing and results generally fell within this range, except for the filter all case of course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)