You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Oleg Anastasyev (JIRA)" <ji...@apache.org> on 2010/10/11 13:18:33 UTC

[jira] Created: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
------------------------------------------------------------------------------

Key: CASSANDRA-1602
URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
Project: Cassandra
Issue Type: New Feature
Components: Core, Tools
Reporter: Oleg Anastasyev

As couple of people from mailing list suggested (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ), I am sharing patch to retain (archive) commit logs generated by cassandra and restore data by rolling forward of commit logs to previously backed up or snapshotted data files.

Here is an instruction of how to use it, which i extracted from out internal wiki:

We rely on cassandra replication factor for disaster recovery.

But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.

To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.

Originally cassandra does not support log archive , so I implemented it by myself.

The idea is simple:
# As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
# Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
# Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
# To restore data, admin must:
## stop cassandra instance
## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
## copy to /data last snapshot data files
## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
## then start cassandra node instance as usual.

Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:

{code:xml}
<CommitLogDirectory>/commitlog</CommitLogDirectory>
<CommitLogArchive>true</CommitLogArchive>
<DataFileDirectories>
<DataFileDirectory>/data</DataFileDirectory>
</DataFileDirectories>
{code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.

Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Anastasyev updated CASSANDRA-1602:
---------------------------------------

    Attachment: 1602-cassandra0.6.txt

Attached patch rebased to current 0.6 branch

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920451#action_12920451 ] 

Oleg Anastasyev commented on CASSANDRA-1602:
--------------------------------------------

Having hard link created so soon is not good - some ppl backup commit logs to a safe location. So if hard link is created before commit log segment is actually closed - it is hard to determine - is it ready for copy or not. That's why in my code hard link is created only after commit log segment is closed and will never been written to.

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919820#action_12919820 ] 

Jonathan Ellis edited comment on CASSANDRA-1602 at 10/11/10 9:45 AM:
---------------------------------------------------------------------

can you rebase to 0.6 svn branch head?  http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.6/ (tried; there are a few merge conflicts)

      was (Author: jbellis):
    can you rebase to 0.6 svn branch head?  http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.6/
  
> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.4
>
>         Attachments: 1602-0.6.4.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920451#action_12920451 ] 

Oleg Anastasyev edited comment on CASSANDRA-1602 at 10/13/10 1:53 AM:
----------------------------------------------------------------------

Having hard link created so soon is not good - some ppl backup commit logs to a safe location. So if hard link is created before commit log segment is actually closed - it is hard to determine - is it ready for copy or not. That's why in my code hard link is created only after commit log segment is closed and will never been written to.
Creating hard link after commit log is closed makes this decision very simple - as soon as file is appeared in archive directory - it can be copied to safe location and removed as soon as copy is finished.

      was (Author: m0nstermind):
    Having hard link created so soon is not good - some ppl backup commit logs to a safe location. So if hard link is created before commit log segment is actually closed - it is hard to determine - is it ready for copy or not. That's why in my code hard link is created only after commit log segment is closed and will never been written to.
  
> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919820#action_12919820 ] 

Jonathan Ellis commented on CASSANDRA-1602:
-------------------------------------------

can you rebase to 0.6 svn branch head?  http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.6/

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.4
>
>         Attachments: 1602-0.6.4.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920453#action_12920453 ] 

Oleg Anastasyev edited comment on CASSANDRA-1602 at 10/13/10 1:52 AM:
----------------------------------------------------------------------

And you did not included org.apache.cassandra.tools.ReplayLogs in v2 patch, which is necessary for bin/logreplay to run

      was (Author: m0nstermind):
    And you missed org.apache.cassandra.tools.ReplayLogs class from your patch.
  
> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919915#action_12919915 ] 

Oleg Anastasyev edited comment on CASSANDRA-1602 at 10/11/10 1:27 PM:
----------------------------------------------------------------------

Attached patch rebased to current 0.6 branch in [^1602-cassandra0.6.txt]

      was (Author: m0nstermind):
    Attached patch rebased to current 0.6 branch
  
> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1602:
--------------------------------------

    Fix Version/s:     (was: 0.6.4)
                   0.6.7

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Anastasyev updated CASSANDRA-1602:
---------------------------------------

    Attachment: 1602-0.6.4.txt

Patch on original 0.6.4 version

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>         Attachments: 1602-0.6.4.txt
>
>
> As couple of people from mailing list suggested (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ), I am sharing patch to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Anastasyev updated CASSANDRA-1602:
---------------------------------------

    Description: 
As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.

Here is an instruction of how to use it, which i extracted from out internal wiki:

We rely on cassandra replication factor for disaster recovery.

But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.


To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.

Originally cassandra does not support log archive , so I implemented it by myself.

The idea is simple:
# As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
# Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
# Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
# To restore data, admin must:
## stop cassandra instance
## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
## copy to /data last snapshot data files
## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
## then start cassandra node instance as usual.

Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:

{code:xml}
<CommitLogDirectory>/commitlog</CommitLogDirectory>
<CommitLogArchive>true</CommitLogArchive>
<DataFileDirectories>
<DataFileDirectory>/data</DataFileDirectory>
</DataFileDirectories>
{code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.

Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.


  was:
As couple of people from mailing list suggested (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ), I am sharing patch to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.

Here is an instruction of how to use it, which i extracted from out internal wiki:

We rely on cassandra replication factor for disaster recovery.

But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.


To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.

Originally cassandra does not support log archive , so I implemented it by myself.

The idea is simple:
# As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
# Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
# Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
# To restore data, admin must:
## stop cassandra instance
## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
## copy to /data last snapshot data files
## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
## then start cassandra node instance as usual.

Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:

{code:xml}
<CommitLogDirectory>/commitlog</CommitLogDirectory>
<CommitLogArchive>true</CommitLogArchive>
<DataFileDirectories>
<DataFileDirectory>/data</DataFileDirectory>
</DataFileDirectories>
{code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.

Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.



> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.4
>
>         Attachments: 1602-0.6.4.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1602:
--------------------------------------

    Attachment: 1602-v2.txt

v2 attached.

The main difference is it cuts down on the special cases in CommitLog by creating a hard link to the new segment as soon as it is created.  Then the rest of the logic can proceed oblivious to archive mode.

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920413#action_12920413 ] 

Jonathan Ellis edited comment on CASSANDRA-1602 at 10/12/10 9:11 PM:
---------------------------------------------------------------------

Thanks Oleg!

v2 attached.

The main difference is it cuts down on the special cases in CommitLog by creating a hard link to the new segment as soon as it is created.  Then the rest of the logic can proceed oblivious to archive mode.

      was (Author: jbellis):
    v2 attached.

The main difference is it cuts down on the special cases in CommitLog by creating a hard link to the new segment as soon as it is created.  Then the rest of the logic can proceed oblivious to archive mode.
  
> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1602) Commit Log archivation and rolling forward utility (AKA Retaining commit logs)

Posted by "Oleg Anastasyev (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920453#action_12920453 ] 

Oleg Anastasyev commented on CASSANDRA-1602:
--------------------------------------------

And you missed org.apache.cassandra.tools.ReplayLogs class from your patch.

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) to retain (archive) commit logs generated by cassandra and  restore data by rolling forward of commit logs to previously backed up or snapshotted data files.
> Here is an instruction of how to use it, which i extracted from out internal wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, which can lead to data destruction on the whole cluster. But the freshest backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive&nbsp; backup strategy - i.e. we collect commit logs and snapshotted data files. On event of data loss, either due hardware failure or logical bug, we restore last snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink (unix command "ln $1 $2") is created from just closed commit log file to commit log archive directory. Both commit log and commit log archive are on the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit log archive dir and copies them over net to a backup location. As soon as file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do data snapshots from time to time using "nodetool snapshot" command, available from standard cassandra distribution and copies snapshot files to backup location.
> ## Creating a snapshot is very light operation for cassandra - under the hood it is just hardlinking currently existing files to "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older than snapshot data files could be copied. copying too old will do no harm, but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility (bin/logreplay) with option -forced or -forcedcompaction and wait for its completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node (use option "-forcedcompact" to do major compaction right after log roll forward process completion). I also made a script named <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.