You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Chao Shi (JIRA)" <ji...@apache.org> on 2013/08/01 10:41:51 UTC

[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

    [ https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726224#comment-13726224 ] 

Chao Shi commented on HBASE-9102:
---------------------------------

I don't think block cache should be used for such prefetch, as large sequential scan will swap-out blocks for random read.
If we use hdfs client for prefetch, we also need to implement scanner-sticky DFSInputStream, as seek called by another scanner will clear all the prefetch work. 

Another question is how do we consider if a scan is sequential or random. The current implementation (before Lars's patch HBASE-7336) only treats Get as random and thus uses pread. In our scenario, there are two kinds of scans: a) from online system and b) MR. Most of a) does not scan more than 1 block and are expected to return within tens of milliseconds.
                
> HFile block pre-loading for large sequential scan
> -------------------------------------------------
>
>                 Key: HBASE-9102
>                 URL: https://issues.apache.org/jira/browse/HBASE-9102
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89-fb
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The current HBase scan model cannot take full advantage of the aggrediate disk throughput, especially for the large sequential scan cases. And for the large sequential scan, it is easy to predict what the next block to read in advance so that it can pre-load and decompress/decoded these data blocks from HDFS into block cache right before the current read point. 
> Therefore, this jira is to optimized the large sequential scan performance by pre-loading the HFile blocks into the block cache in a stream fashion so that the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira