You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2012/10/16 00:25:06 UTC
[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476528#comment-13476528 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

I did a quick prototype against 89-fb with expected results. In my test setup, I was doing WAL-less puts, and previously wasn't able to go much beyond 100MB/second of ingest into HBase, but with parallel flushing, was able to get 3-4x improvement.

Two locks that got in the way of the implementation were (which I temporarily just commented out in the prototype) are:

* In MemStoreFlusher.java, the lock variable named "lock" seems to be getting acquired in MemStoreFlusher.java:interruptIfNecessary() to ensure that an orderly shutdown is done after any in-progress flush completes.  Because the flushRegion() also grabs the same lock, we will need to figure out if we can simply get rid of the lock or use reader-writer locks (such that the flushers can grab it in read mode, and the interrupt grabs it in write mode).

* In HLog.java: startCacheFlush/completeCacheFlush() grab the cacheFlushLock. This lock is also grabbed by the log roller (rollWriter()) and HLog.close() methods. It is not clear to me yet why the rollWriter() needs to grab the cacheFlushLock.

If anyone has further thoughts on a good resolution for the above locks or the exact original intent for those locks (Stack?), please share your ideas.

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira