You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2009/08/09 08:12:14 UTC
[jira] Commented: (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

    [ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741032#action_12741032 ] 

Jean-Daniel Cryans commented on HBASE-1364:
-------------------------------------------

The WAL situation currently looks like this:

A region server has a default maximum of 30 log files.
A log file is synced every 10 seconds or when it reaches 100 entries.
A log file is rolled when it gets to 95% of a block size.

What's happening during a log split is:
Until all log files are split
 - 3 readers read one log each into memory
 - 3 writers output to differents oldlogfile.log

That means that in the worst case 10 iterations will be done, a maximum of 182MB of data will be put into memory 
during the reading phase and a maximum of 1824MB will pass through memory... probably getting us some GC pauses.
Also supposing that thefailing RS had 100 regions and only 10 of them had edits, you hold off 90 regions for nothing. The only
thing holding us is that we don't have any knowledge about which regions are in which logs.

One proposition: 
Make that every hlog publishes in ZK its location under every region's folder and in its own folder. (both new in 0.21)
When a RS crashes, just reassign the whole thing right away.
When a RS opens a new region, it should check if there's any hlog in the region's ZK folder.
If so, it should require other RS to help it split the logs and watch the logs (something wise).
Log splitting :
 - Take a lock on the hlog folder.
 - Split the log in as many files as there are regions and put the files in the region's folder in HDFS.
 - oldlogfile.log should now have a sequence id.
 - Finally delete the hlog znodes.

When all logs are split for a region, the RS opening it opens file by file the logs (in order) and runs them.

When a region server tries to log split and a lock is held, it should watch it.


> [performance] Distributed splitting of regionserver commit logs
> ---------------------------------------------------------------
>
>                 Key: HBASE-1364
>                 URL: https://issues.apache.org/jira/browse/HBASE-1364
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.21.0
>
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.