You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/04/06 23:18:06 UTC
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

    [ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016539#comment-13016539 ] 

stack commented on HBASE-1364:
------------------------------

I did the top 50% or so again over on rb.  Over there I write "...I'd be up for committing this sooner rather than later, especially if its bascially working for you.  My thought is this is a big patch and its critical functionality so commit and get the rest of the community helping iron out bugs.  Let us know when you think its good to commit."

FYI Prakash, you need to update here when you post a new patch, at least for the moment, because email of notifications is not working (We'll be moving to Apache's review board instance some time soon).

> [performance] Distributed splitting of regionserver commit logs
> ---------------------------------------------------------------
>
>                 Key: HBASE-1364
>                 URL: https://issues.apache.org/jira/browse/HBASE-1364
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors
>            Reporter: stack
>            Assignee: Prakash Khemani
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-1364.patch
>
>          Time Spent: 8h
>  Remaining Estimate: 0h
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira