You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Liyin Tang (Created) (JIRA)" <ji...@apache.org> on 2012/02/15 19:43:03 UTC
[jira] [Created] (HBASE-5403) Checkpoint the compressed HLog
Checkpoint the compressed HLog
------------------------------
Key: HBASE-5403
URL: https://issues.apache.org/jira/browse/HBASE-5403
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
Let's assume that HBase replication can be based on replaying the HLog in the replica cluster.
The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed.
So the proposal is to write a series of checkpoints within the HLog.
Each each checkpoint will have a header with some special sequence of bytes.
And between each checkpoints, HLog should use new dictionaries to compress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5403) Checkpoint the compressed HLog
Posted by "Ian Varley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446220#comment-13446220 ]
Ian Varley commented on HBASE-5403:
-----------------------------------
Liyin, this issue has been dormant for a little while. Any thoughts on Todd's suggestion? Should we keep this open as an option, or close / alter it in favor of addressing it on the replay side?
> Checkpoint the compressed HLog
> ------------------------------
>
> Key: HBASE-5403
> URL: https://issues.apache.org/jira/browse/HBASE-5403
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
>
> Let's assume that HBase replication can be based on replaying the HLog in the replica cluster.
> The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed.
> So the proposal is to write a series of checkpoints within the HLog.
> Each each checkpoint will have a header with some special sequence of bytes.
> And between each checkpoints, HLog should use new dictionaries to compress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5403) Checkpoint the compressed HLog
Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208752#comment-13208752 ]
Li Pi commented on HBASE-5403:
------------------------------
Rolling the log would just reset the dictionary, which means performance will be degraded for a bit until the dictionary was built back up again.
I'm assuming checkpointing would involve dumping the contents of the dictionary at certain points - but the max size of the dictionary can be quite large, up to 32 megabytes or so in extreme cases. This has its own problems.
> Checkpoint the compressed HLog
> ------------------------------
>
> Key: HBASE-5403
> URL: https://issues.apache.org/jira/browse/HBASE-5403
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
>
> Let's assume that HBase replication can be based on replaying the HLog in the replica cluster.
> The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed.
> So the proposal is to write a series of checkpoints within the HLog.
> Each each checkpoint will have a header with some special sequence of bytes.
> And between each checkpoints, HLog should use new dictionaries to compress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5403) Checkpoint the compressed HLog
Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208724#comment-13208724 ]
Nicolas Spiegelberg commented on HBASE-5403:
--------------------------------------------
We had discussed this: https://issues.apache.org/jira/browse/HBASE-4608?focusedCommentId=13192604
What is the benefit of checkpointing versus just rolling the log?
> Checkpoint the compressed HLog
> ------------------------------
>
> Key: HBASE-5403
> URL: https://issues.apache.org/jira/browse/HBASE-5403
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
>
> Let's assume that HBase replication can be based on replaying the HLog in the replica cluster.
> The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed.
> So the proposal is to write a series of checkpoints within the HLog.
> Each each checkpoint will have a header with some special sequence of bytes.
> And between each checkpoints, HLog should use new dictionaries to compress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5403) Checkpoint the compressed HLog
Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208843#comment-13208843 ]
Todd Lipcon commented on HBASE-5403:
------------------------------------
Another option is to address this at the "read" side, since the failure recovery is probably a rare case. If the log reader sees an error, it can record the offset, start over, and "skip" records until it gets back to the point where it wants to start reading again. If we expect these failure scenarios to be unlikely, maybe this is better?
> Checkpoint the compressed HLog
> ------------------------------
>
> Key: HBASE-5403
> URL: https://issues.apache.org/jira/browse/HBASE-5403
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
>
> Let's assume that HBase replication can be based on replaying the HLog in the replica cluster.
> The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed.
> So the proposal is to write a series of checkpoints within the HLog.
> Each each checkpoint will have a header with some special sequence of bytes.
> And between each checkpoints, HLog should use new dictionaries to compress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5403) Checkpoint the compressed HLog
Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208831#comment-13208831 ]
Liyin Tang commented on HBASE-5403:
-----------------------------------
@Nicolas, The block size in the DFS usually will be set quite large, let's say 256M. And it is inefficient to write small log file which is less than one dfs block. I asume this is the main benefit of checkpointing vs rolling the log.
> Checkpoint the compressed HLog
> ------------------------------
>
> Key: HBASE-5403
> URL: https://issues.apache.org/jira/browse/HBASE-5403
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
>
> Let's assume that HBase replication can be based on replaying the HLog in the replica cluster.
> The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed.
> So the proposal is to write a series of checkpoints within the HLog.
> Each each checkpoint will have a header with some special sequence of bytes.
> And between each checkpoints, HLog should use new dictionaries to compress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5403) Checkpoint the compressed HLog
Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208751#comment-13208751 ]
Li Pi commented on HBASE-5403:
------------------------------
Rolling the log would just reset the dictionary, which means performance will be degraded for a bit until the dictionary was built back up again.
I'm assuming checkpointing would involve dumping the contents of the dictionary at certain points - but the max size of the dictionary can be quite large, up to 32 megabytes or so in extreme cases. This has its own problems.
> Checkpoint the compressed HLog
> ------------------------------
>
> Key: HBASE-5403
> URL: https://issues.apache.org/jira/browse/HBASE-5403
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
>
> Let's assume that HBase replication can be based on replaying the HLog in the replica cluster.
> The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed.
> So the proposal is to write a series of checkpoints within the HLog.
> Each each checkpoint will have a header with some special sequence of bytes.
> And between each checkpoints, HLog should use new dictionaries to compress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira