You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jeongdae Kim (JIRA)" <ji...@apache.org> on 2018/11/02 11:14:00 UTC
[jira] [Comment Edited] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.

    [ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672938#comment-16672938 ] 

Jeongdae Kim edited comment on HBASE-21418 at 11/2/18 11:13 AM:
----------------------------------------------------------------

Thanks for your comments. I’ll reflect your comments to the next patch.
{quote}
Generally I am not a fan of adding more HBase and/or scan options that one has to know about. (which is why I had removed the LOOK_AHEAD hint that I myself had added a bit earlier).
{quote}
I 100% agree with you, and would like to do without options too. but, we don't have any information like next block index as far as we use ConcurrentSkipListMap as data structure, I couldn’t find a nice solution without extra cost.

{quote}
Why max versions here? The SEEKing can also be an issue with many columns, right?
 
If we can, let's find a heuristic to do this automatically (like I did with HFiles), so that a user won't have to hint.
{quote}
Right, I used the max versions as a heuristic in case that users pass no hint. I had no any idea about proper heuristic.
If we can bear small extra costs when putting cells into a memstore, What about maintaining some stats for columns and using it to decide whether doing seek operations or not. Let me try to make a patch for this.


was (Author: jeongdae kim):
Thanks for your comments. I’ll reflect your comments to the next patch.
{quote}
Generally I am not a fan of adding more HBase and/or scan options that one has to know about. (which is why I had removed the LOOK_AHEAD hint that I myself had added a bit earlier).
{quote}
I 100% agree with you, and would like to do without options too. but, I couldn’t find a nice solution without extra cost.

{quote}
Why max versions here? The SEEKing can also be an issue with many columns, right?
 
If we can, let's find a heuristic to do this automatically (like I did with HFiles), so that a user won't have to hint.
{quote}
Right, I used the max versions as a heuristic in case that users pass no hint. I had no any idea about proper heuristic.
If we can bear small extra costs when putting cells into a memstore, What about maintaining some stats for columns and using it to decide whether doing seek operations or not. Let me try to make a patch for this.

> Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21418
>                 URL: https://issues.apache.org/jira/browse/HBASE-21418
>             Project: HBase
>          Issue Type: Improvement
>          Components: scan, Scanners
>    Affects Versions: 1.2.5
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>            Priority: Minor
>              Labels: performance
>         Attachments: HBASE-21418.branch-1.2.001.patch, HBASE-21418.branch-1.2.001.patch
>
>
> We observed “responseTooSlow” logs for Get requests in our production clusters. even some get requests were responded after 10 seconds.
> Affected get requests were done with the timerange, and target rows have many columns that have some versions.
> We reproduced this issue, and found this behavior happens only when scanning in the memstore. after flushing the HStore, this slow response issue for Get disappeared and all same get requests are responded very quickly.
>  
> We investigated this case, and found this performance difference between memstore scanner and hfile scanner is caused by the number of reseek operations executed while scanning. When a store scanner needs to reseek the next column, Hfile scanner wisely decide whether it have to reseek or not by checking the seek point is in current block, whereas memstore scanner just do reseek without decision unlike Hfile scanner. In our case, almost all columns in the memstore have older timestamp than scan(get)’s timerange, and so many reseek operations occur as much as about the number of columns. This results in increasing the response time of Get requests sporadically.
>  
> To improve the reseek operation of the memstore scanner, i think it’s better skipping than seeking when reseek requested, if seek point is quite close to current cell that the scanner is pointing now.(Actually, i changed MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time of Get was 6x faster than before) But we can’t decide whether seek point is close to the current cell or not, because memstore scannner has no information such as next block index.
>  Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this case, and it may be deprecated someday. But, i think that hint is still be useful for the memstore scanner to try to skip first, before reseeking, and with this option we can make reseek operations of memstore scanner smarter.
>  
> I tested this patch in our case, and got the same result as i changed matchcode (mentioned above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)