You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chunhui shen (JIRA)" <ji...@apache.org> on 2012/06/05 03:37:23 UTC

[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

    [ https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289067#comment-13289067 ] 

chunhui shen commented on HBASE-6134:
-------------------------------------

In the review board ,Prakash Khemani saied
bq."But the old code had a serious drawback – it would read the entire log file in memory before writing it out. Also the old code assumed that multiple log files were being split at the same time, but that is no longer true with distributed log splitting.
Whatever approach we take, I don’t think we should re-introduce buffering of the entire log file in memory."

Since we set maximal buffer size 128MB, I don't know what effect would cause if using buffer. Anyway, splitting log happens infrequently.

what others consider?


                
> Improvement for split-worker to speed up distributed-split-log
> --------------------------------------------------------------
>
>                 Key: HBASE-6134
>                 URL: https://issues.apache.org/jira/browse/HBASE-6134
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch
>
>
> First,we do the test between local-master-splitting and distributed-log-splitting
> Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths splitting work), 400 regions in one hlog file
> local-master-split:60s+
> distributed-log-splitting:165s+
> In fact, in our production environment, distributed-log-splitting also took 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
> We found split-worker split one log file took about 20s
> (30ms~50ms per writer.close(); 10ms per create writers )
> I think we could do the improvement for this:
> Parallelizing the create and close writers in threads
> In the patch, change the logic for  distributed-log-splitting same as the local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira