You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Lars George (JIRA)" <ji...@apache.org> on 2012/08/02 07:44:02 UTC

[jira] [Created] (HBASE-6497) Revisit HLog sizing and roll parameters

Lars George created HBASE-6497:
----------------------------------

             Summary: Revisit HLog sizing and roll parameters
                 Key: HBASE-6497
                 URL: https://issues.apache.org/jira/browse/HBASE-6497
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
            Reporter: Lars George


The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429350#comment-13429350 ] 

Jean-Daniel Cryans commented on HBASE-6497:
-------------------------------------------

bq. Less parallelization per RS. If you have a lot of RSes, lowering file count does help reduce HBase RPCs too?

I'm not sure I understand what you mean. HBase RPCs in which context?
                
> Revisit HLog sizing and roll parameters
> ---------------------------------------
>
>                 Key: HBASE-6497
>                 URL: https://issues.apache.org/jira/browse/HBASE-6497
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Lars George
>
> The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

Posted by "Harsh J (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427260#comment-13427260 ] 

Harsh J commented on HBASE-6497:
--------------------------------

bq. Should we increase the maxlogs number (default is 32)?

We should decrease the maxlogs number for recovery to be faster, right? Increasing help prevents any premature form of flushes on regions caused by the log roller during heavy writes, but impacts recovery time as all HLogs will then get unnecessarily processed.

In order to keep the recovery times same as today, we can switch the numbers this way:

If current is 128 MB x 32 = 4096 MB (4 GB) of logs approx. before full flush, then lets change that to have fewer than 32 files (reduces NN RPCs during recovery and increases the sequential read length) on to 8 maxlogs at 512 MB default size (8x512 = 4096 again). Or we could set a target of 8 GB and work out from that?
                
> Revisit HLog sizing and roll parameters
> ---------------------------------------
>
>                 Key: HBASE-6497
>                 URL: https://issues.apache.org/jira/browse/HBASE-6497
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Lars George
>
> The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427615#comment-13427615 ] 

Jean-Daniel Cryans commented on HBASE-6497:
-------------------------------------------

bq. Is there a need to keep the logs small (typically 64-128 depending on the HDFS config)?

bq. If current is 128 MB x 32 = 4096 MB (4 GB) of logs approx. before full flush, then lets change that to have fewer than 32 files (reduces NN RPCs during recovery and increases the sequential read length) on to 8 maxlogs at 512 MB default size (8x512 = 4096 again).

Issues with bigger files while having less of them:

 - Less parallelization during distributed splitting since the unit of distribution is a file.
 - Less opportunities to get rid of logs without having to force flush regions. The worst case would be having max 1 file meaning that when you roll you need to force flush everything that hasn't been flushed yet.
                
> Revisit HLog sizing and roll parameters
> ---------------------------------------
>
>                 Key: HBASE-6497
>                 URL: https://issues.apache.org/jira/browse/HBASE-6497
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Lars George
>
> The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427130#comment-13427130 ] 

Lars George commented on HBASE-6497:
------------------------------------

The goal in designing a proper HBase schema is to maximize heap usage across all regions, which can lead to the situation where the WALs (aka HLog's) are required to be kept for a considerable amount of time. 

The last iteration on WAL properties added a configurable block size, as well as threshold percentage to roll the log before it completely fills the single HDFS block (see HBASE-1394).

I am questioning if this is still in issue, maybe even in the light of recent improvements on log performance, for example HBASE-5699 and HBASE-4608.

At the least, I would like to figure out, if we should increase the WAL size to 512MB, to avoid getting into early flushing situations, impacting the overall I/O. Isn't HBASE-1364 helping to split larger logs (though not the logs themselves but distributed across the region servers obviously). I am not sure if the log splitting prefers block local nodes first, so that there is no remote reading though.

Questions:

# Is there a need to keep the logs small (typically 64-128 depending on the HDFS config)?
# Should we go multiple blocks?
# Do we still need the logroll multiplier?
# Should we increase the maxlogs number (default is 32)?
                
> Revisit HLog sizing and roll parameters
> ---------------------------------------
>
>                 Key: HBASE-6497
>                 URL: https://issues.apache.org/jira/browse/HBASE-6497
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Lars George
>
> The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

Posted by "Harsh J (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428544#comment-13428544 ] 

Harsh J commented on HBASE-6497:
--------------------------------

bq. Less parallelization during distributed splitting since the unit of distribution is a file.

Less parallelization per RS. If you have a lot of RSes, lowering file count does help reduce HBase RPCs too?
                
> Revisit HLog sizing and roll parameters
> ---------------------------------------
>
>                 Key: HBASE-6497
>                 URL: https://issues.apache.org/jira/browse/HBASE-6497
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Lars George
>
> The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427403#comment-13427403 ] 

Andrew Purtell commented on HBASE-6497:
---------------------------------------

> Is there a need to keep the logs small (typically 64-128 depending on the HDFS config)?

At least for the case of replication being active, then yes we do want to hold down the log size. But that would not be the default case.

> Should we increase the maxlogs number (default is 32)?

With distributed splitting, I think we can.
                
> Revisit HLog sizing and roll parameters
> ---------------------------------------
>
>                 Key: HBASE-6497
>                 URL: https://issues.apache.org/jira/browse/HBASE-6497
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Lars George
>
> The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira