You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2018/12/12 19:12:00 UTC

[jira] [Commented] (HBASE-17958) Avoid passing unexpected cell to ScanQueryMatcher when optimize SEEK to SKIP

    [ https://issues.apache.org/jira/browse/HBASE-17958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719318#comment-16719318 ] 

Lars Hofhansl commented on HBASE-17958:
---------------------------------------

Looking at this one again... Since it popped up in the profiler again.
I tried moving the check into StoreFileScanner or HFileScanner, but much of the cost unfortunately is spent on the higher level (KeyValueHeap mostly).

I wanted to come back to the discussion about how often we need to check the next indexed key.
While it is true that key *may* change during heap.next(), this is just a heuristic based on the key we're looking to have an estimate whether seeking or skipping would be more effective.
Right now we're always paying the cost of an extra compare per K/V  to guard against the rare case when the scanner switches *and* that new new scanner has many versions. 

So I propose again moving that compare out of the loop, and only check once, it's good enough for a heuristic, and not needed for correctness, and in the the case I'm seeing this compare represents 40% of the time spent in StoreScanner.next().

In gist: This is a heuristic to try to guess whether SKIP or SEEK is better. It only has to be mostly right. I'll file a separate Jira.

> Avoid passing unexpected cell to ScanQueryMatcher when optimize SEEK to SKIP
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-17958
>                 URL: https://issues.apache.org/jira/browse/HBASE-17958
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>             Fix For: 1.4.0, 2.0.0
>
>         Attachments: 0001-add-one-ut-testWithColumnCountGetFilter.patch, 17958-add.txt, HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, HBASE-17958-v1.patch, HBASE-17958-v2.patch, HBASE-17958-v3.patch, HBASE-17958-v4.patch, HBASE-17958-v5.patch, HBASE-17958-v6.patch, HBASE-17958-v7.patch, HBASE-17958-v7.patch
>
>
> {code}
> ScanQueryMatcher.MatchCode qcode = matcher.match(cell);
> qcode = optimize(qcode, cell);
> {code}
> The optimize method may change the MatchCode from SEEK_NEXT_COL/SEEK_NEXT_ROW to SKIP. But it still pass the next cell to ScanQueryMatcher. It will get wrong result when use some filter, etc. ColumnCountGetFilter. It just count the  columns's number. If pass a same column to this filter, the count result will be wrong. So we should avoid passing cell to ScanQueryMatcher when optimize SEEK to SKIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)