You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Ted Yu <te...@yahoo.com> on 2011/04/09 16:00:23 UTC

Review Request: Speedup LoadIncrementalHFiles

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/572/
-----------------------------------------------------------

Review request for hbase and Todd Lipcon.


Summary
-------

I refactored LoadIncrementalHFiles so that tryLoad() queues work items in List<ServerCallable<Void>>. doBulkLoad() periodically sends batch of ServerCallable's to HBase cluster.
I added the following method to HConnection/HConnectionManager:
    public <T> void getRegionServerWithRetries(ExecutorService pool,
        List<ServerCallable<T>> callables, Object[] results)
This method uses thread pool to send multiple ServerCallable's through getRegionServerWithRetries(ServerCallable<T> callable).

I introduced two new config parameters: hbase.loadincremental.threads.max and hbase.loadincremental.batch.size
hbase.loadincremental.batch.size is for configuring the batch size above which HConnection.getRegionServerWithRetries() would be called. In Adam's case, there're many small HFiles. LoadIncrementalHFiles shouldn't wait until all HFiles have been scanned.
hbase.loadincremental.threads.max controls the maximum number of threads in thread pool.


This addresses bug HBASE-3721.
    https://issues.apache.org/jira/briwse/HBASE-3721


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/client/HConnection.java 1090500 
  /src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 1090500 
  /src/main/java/org/apache/hadoop/hbase/client/HTable.java 1090500 
  /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1090500 

Diff: https://reviews.apache.org/r/572/diff


Testing
-------

TestLoadIncrementalHFiles and TestHFileOutputFormat pass.


Thanks,

Ted