You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2012/10/11 22:25:03 UTC

[jira] [Created] (HBASE-6980) Parallel Flushing Of Memstores

Kannan Muthukkaruppan created HBASE-6980:
--------------------------------------------

             Summary: Parallel Flushing Of Memstores
                 Key: HBASE-6980
                 URL: https://issues.apache.org/jira/browse/HBASE-6980
             Project: HBase
          Issue Type: New Feature
            Reporter: Kannan Muthukkaruppan
            Assignee: Kannan Muthukkaruppan


For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.

* For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. [Topic for a separate JIRA.]

* But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478622#comment-13478622 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

Todd: If RS zk expires, and master initiates recovery/log splitting, then the first step is to rename the log directory from .logs/rs to .logs/rs-splitting. And then the lease recovery is done on the individual files within the directory. Because of the directory name, any attempt by the old RS to delete any old log files (in the old path) should fail. Therefore, still not seeing the value of writing the flush marker.
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478212#comment-13478212 ] 

Todd Lipcon commented on HBASE-6980:
------------------------------------

If I remember correctly, there is a reason for the flush marker: it ensures that the RS hasn't been fenced on HDFS -- i.e that it hasn't lost its connection to ZK and already had its log splitting started.

The reason this is important is that, otherwise, it could move on to delete old log segments, which would potentially break the log split process.

It may be that the locking can be more lax, though.
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476543#comment-13476543 ] 

Ted Yu commented on HBASE-6980:
-------------------------------

Here is javadoc for cacheFlushLock:
{code}
  // This lock prevents starting a log roll during a cache flush.
  // synchronized is insufficient because a cache flush spans two method calls.
  private final ReentrantReadWriteLock cacheFlushLock = new ReentrantReadWriteLock();
{code}

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-6980:
-----------------------------------------

    Description: 
For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.

* For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).

* But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

  was:
For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.

* For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6980).

* But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

    
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477647#comment-13477647 ] 

ramkrishna.s.vasudevan commented on HBASE-6980:
-----------------------------------------------

bq.#1. It is not clear why we even write a META entry for flushes...
Yes.  This is actually not used but still that forms the latest entry.  So currently in 0.94 and trunk uses a map to form the name of the replayedits file that should have the seq id of maximum of the edits.  Previously i remember that it was minimum of the seq id that was used for naming the replayEdits. 
In one of the issues we were discussing on the usefulness of the meta data entry after flush. We can once again verify and we can remove it if there is not much usefulness from it.

bq.we track the min seq id from the current memstore instead of the max seq id from the snapshot memstore
The HLog keeps track of the minSeqid for the region. So you suggesting that we can only track the max seq id whenever an append happens to HLog? So on flush start we just clear this entry and use that max value for completing the flush. 
Thanks for the insights.  
 
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487263#comment-13487263 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

#1) This change has now been committed in 89-fb. Here's the commit info:

   http://svn.apache.org/viewvc?view=revision&revision=1403627

#2) This case should also help region server restart time or cluster restart time because when stopping a region server the memstores need to get flushed, and now the flushes can happen in parallel.

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.89-fb
>
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476571#comment-13476571 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

Ted:

#1. I did see that javadoc. But I am still not clear why log rolling and flushes need to be mutually exclusive of each other. As long as we correctly track what is the min sequence id that has not yet been flushed, log rolling can maintain its correctness independent of the actual on-going flushes, right?

#2. Also, in the code I saw for trunk:

src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:191:  

private final Lock cacheFlushLock = new ReentrantLock();

the lock was a RentrantLock(). 

In the code you pasted, however, it is ReentrantReadWriteLock(). 

Can you confirm with svn repo you are referring to? The one I checked was: https://svn.apache.org/repos/asf/hbase/trunk/hbase-server.

regards,
Kannan
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6980) Parallel Flushing Of Memstores [89-fb]

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-6980:
-----------------------------------------

    Summary: Parallel Flushing Of Memstores [89-fb]  (was: Parallel Flushing Of Memstores)
    
> Parallel Flushing Of Memstores [89-fb]
> --------------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.89-fb
>
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-6980:
-----------------------------------------

    Description: 
For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.

* For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6980).

* But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

  was:
For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.

* For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. [Topic for a separate JIRA.]

* But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

    
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6980).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476756#comment-13476756 ] 

ramkrishna.s.vasudevan commented on HBASE-6980:
-----------------------------------------------

@Kannan
I can try to explain what i see from the code.  I am sure you would have checked but just in case you have not.  I may be wrong so pls correct me.
>From the flush code we see that once we do the startFlush() we acquire the cacheFlushLock.
The same is cleared in completeCacheFlush().  It is here where we write the latest seqid corresponding to that flush.  Suppose the seq ids 2,3,4 are getting flushed we write a seq id 5 for the current flush entry.  
As the javadoc too says, now suppose a log rolling happens without cacheflush lock
{code}
synchronized (updateLock) {
        // Clean up current writer.
        Path oldFile = cleanupCurrentWriter(currentFilenum);
        this.writer = nextWriter;
        this.hdfs_out = nextHdfsOut;
{code}
The log writer may be changed and the completeCacheflush may write to a new file (if am not wrong).  
Also currently whenever we do flush the oldest seq id for the region is removed from the lastSeqWritten and the same is again populated back with a sort of encoded name for the region
{code}
Long oldseq =
        lastSeqWritten.put(getSnapshotName(encodedRegionName), seq);
{code}
This was done for a dataloss issue by FB.  So may be if we don acquire the cacheFlushLock on rollWriter, the rollWriter() may see some regions(with encoded name) which has min seq id and will try to flush them too in
{code}
 private byte [] getOldestRegion(final Long oldestOutstandingSeqNum) {
    byte [] oldestRegion = null;
    for (Map.Entry<byte [], Long> e: this.lastSeqWritten.entrySet()) {
      if (e.getValue().longValue() == oldestOutstandingSeqNum.longValue()) {
        // Key is encoded region name.
        oldestRegion = e.getKey();
        break;
      }
    }
    return oldestRegion;
  }
{code}
May be other experts can give a better answer if am not right here. Thanks.
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores [89-fb]

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492466#comment-13492466 ] 

Andrew Purtell commented on HBASE-6980:
---------------------------------------

Since this was committed to 89-fb should this issue be resolved? Looks like forward porting would be addressed by HBASE-6466.
                
> Parallel Flushing Of Memstores [89-fb]
> --------------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.89-fb
>
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6980) Parallel Flushing Of Memstores [89-fb]

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-6980:
-----------------------------------------

    Fix Version/s: 0.89-fb
           Status: Patch Available  (was: Open)

http://svn.apache.org/viewvc?view=revision&revision=1403627
                
> Parallel Flushing Of Memstores [89-fb]
> --------------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.89-fb
>
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476602#comment-13476602 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

The patch is also using reader-writer locks as I had mentioned... but the open question still remains- can we do better? I suppose for the purpose of parallelizing flusher threads, that suffices. But can anyone think of a good reason why cacheFlushLock needs to be grabbed by log rolling code at all?
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores [89-fb]

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487266#comment-13487266 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

@ Todd: any comments on by 17/Oct comment regarding RS expiry handling and flush marker?
                
> Parallel Flushing Of Memstores [89-fb]
> --------------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.89-fb
>
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476579#comment-13476579 ] 

Ted Yu commented on HBASE-6980:
-------------------------------

Turned out that I was looking at a local copy with HBASE-6466.
HBASE-6466 has similar goal with this JIRA.
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476528#comment-13476528 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

I did a quick prototype against 89-fb with expected results. In my test setup, I was doing WAL-less puts, and previously wasn't able to go much beyond 100MB/second of ingest into HBase, but with parallel flushing, was able to get 3-4x improvement.

Two locks that got in the way of the implementation were (which I temporarily just commented out in the prototype) are:

* In MemStoreFlusher.java, the lock variable named "lock" seems to be getting acquired in MemStoreFlusher.java:interruptIfNecessary() to ensure that an orderly shutdown is done after any in-progress flush completes.  Because the flushRegion() also grabs the same lock, we will need to figure out if we can simply get rid of the lock or use reader-writer locks (such that the flushers can grab it in read mode, and the interrupt grabs it in write mode).

* In HLog.java: startCacheFlush/completeCacheFlush() grab the cacheFlushLock. This lock is also grabbed by the log roller (rollWriter()) and HLog.close() methods. It is not clear to me yet why the rollWriter() needs to grab the cacheFlushLock.

If anyone has further thoughts on a good resolution for the above locks or the exact original intent for those locks (Stack?), please share your ideas.

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477194#comment-13477194 ] 

Karthik Ranganathan commented on HBASE-6980:
--------------------------------------------

@ramakrishna - this should not be necessary for ensuring no data loss right? Once we have a snapshot memstore, we automatically should know the max seq id to which it has data - that would never change.

1. From what I remember of the code (when I was looking into something unrelated), we track the *min* seq id from the current memstore instead of the max seq id from the snapshot memstore to put into the HLog when its rolled after a flush. So this synchronization becomes necessary - if we store the max seq id along with the memstore that is flushed, we should be able to eliminate the locks.

2. Also, its arguable if we need the absolute correct max-seq-id flushed. In a very small % of cases, we would end up rolling logs a bit slower. As long as we are conservative with updating the max seq id in the HLog we should be good, right?
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477575#comment-13477575 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

Ramakrishna,

Thanks for your email.

#1. It is not clear why we even write a META entry for flushes...

{code}
private WALEdit completeCacheFlushLogEdit() {
    KeyValue kv = new KeyValue(METAROW, METAFAMILY, null,
      System.currentTimeMillis(), COMPLETE_CACHE_FLUSH);
    WALEdit e = new WALEdit();
    e.add(kv);
    return e;
  }
{code}

The replayRecoveredEdits() logic skips over these entries anyway. And the only reference I see for this special entry in HLog is in unit tests.

#2. Yes, currently there is a lot of comments (related to lastSeqWritten) before the function HLog.java:startCacheFlush(), but the logic is not very clear to me. The changes were committed as part of HBASE-3845. I think we should be able to simplify that logic. I think I see some potential bugs there even it stands now-- will need to spend some more time looking at this, and will write down an update here.

But bottom line, I still don't see any good fundamental reason we need to hold this lock for the duration of the entire flush (even given the lastSeqWritten map logic).

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476595#comment-13476595 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

Thanks for the pointer-- wasn't aware of the JIRA. I am ok with closing this as a dup, or keep this for the 89-fb patch as the code base is slightly different in some of the parts.  (I think HBASE-6466 description doesn't adequately capture the motivation/wins we can get from this, especially for WAL-less ingest type use case. But we can update the description for that JIRA to reflect those aspects).

I will check the patch to see how Chunhui is getting around the locks. 
                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira