You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2010/02/12 23:42:28 UTC

[jira] Commented: (HBASE-2070) Collect HLogs and delete them after a period of time

    [ https://issues.apache.org/jira/browse/HBASE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833211#action_12833211 ] 

ryan rawson commented on HBASE-2070:
------------------------------------

if a replication stream is delayed, we should never delete logfiles unless the disk space situation is critical.  Replication sending clusters should have plenty of disk space to buffer past all foreseeable disconnection operations.  This might mean buffering 5-10TB of edits...

the alternative is to reset the slave cluster and rebuild from scratch once you lose the sync.  Otherwise you end up with duplicate edits that are not removable.

> Collect HLogs and delete them after a period of time
> ----------------------------------------------------
>
>                 Key: HBASE-2070
>                 URL: https://issues.apache.org/jira/browse/HBASE-2070
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2070-v2.patch, HBASE-2070-v3.patch, HBASE-2070-v4.patch, HBASE-2070.patch
>
>
> For replication we need to be able to service clusters that are a few hours behind in edits. For example, after distcp'ing a snapshot of the DB to another cluster, we need to make sure we get the edits that came in after the snapshot was taken.
> I plan the following changes:
> - Instead of deleting HLogs during a log roll or after a log split, move them to another folder where all logs should be aggregated.
> - Add a new configuration for how old a log can be. For a normal cluster I think of a default of 2 hours. For replication you may want to set it much higher.
> - Create a new thread in the master that checks for logs older than configured time and that deletes them.
> I also fancy having the deletion time to be configurable while the cluster is running. I'm also thinking of adding a way to tell the cluster to replay edits on itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.