You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Eshcar Hillel (JIRA)" <ji...@apache.org> on 2017/01/26 09:28:24 UTC

[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

    [ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839458#comment-15839458 ] 

Eshcar Hillel commented on HBASE-17339:
---------------------------------------

The attached patch is not complete and not properly tested and so may have some bugs (but it is compiling :) ).
I'm posting it to get feedback on the core logic.
The main property needed for this optimization is monotonicity. A store preserves *monotonicity* if all timestamps in its memstore are strictly greater than all timestamps in its store files.

The algorithm is as follows
{code}
0. decide if we should apply optimization: (1) flag is on (2) get operation over a specific set of columns
if decided to apply optimization then
 1. open all relevant *memory* scanners; 
     while opening scanners collect max flushed timestamps in all stores (first collect); 
     a null timestamp indicates the store does not maintain monotonicity
 2. if all stores are monotonic then 
	2.1 get results
	2.2 validate monotonicity: validate max flushed timestamps have not changed in all stores 
           (double-collect ensures results are taken from a consistent view) 
if decided not to apply optimization 
   *OR* stores are not monotonic 
   *OR* decided to apply optimization but results do not satisfy get operation (not enough versions per column) 
then
 3. open all scanners
 4. get results
{code}

Missing parts (TODOs)
- properly init maxFlushedTimestamp (in AbstractMemStore)  when recovering -- need to traverse all existing store files
- make memoryScanOptimization a table property instead of global property; set to true by default
- (Optional) add a flag in Get operation which indicates if the user wants to apply the optimization (per each operation!); set to true by default
- (Optional) check if we can change the implementation of getScanners in XXXMemstore to return multiple scanners so we can later filter out each one of them and not either keep all or eliminate all. Currently the implementation (both in default and compacting) returns a singleton list with one MemStoreScanner which comprises one to few segment scanners.


> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch
>
>
> The current implementation of a get operation (to retrieve values for a specific key) scans through all relevant stores of the region; for each store both memory components (memstores segments) and disk components (hfiles) are scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only components first and only if the result is incomplete scans both memory and disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)