You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/03/20 15:30:33 UTC

[GitHub] [accumulo-website] milleruntime commented on a change in pull request #163: Initial design doc for apache/accumulo#1044

milleruntime commented on a change in pull request #163: Initial design doc for apache/accumulo#1044
URL: https://github.com/apache/accumulo-website/pull/163#discussion_r267401239
 
 

 ##########
 File path: design/system-snapshot.md
 ##########
 @@ -0,0 +1,136 @@
+---
+title: Accumulo System Snapshot Design
+---
+
+## Disclaimer
+
+This is a work in progress design document for a feature that may never be
+implemented.  It represents one possible design for {% ghi 1044 %}, but does
+not have to be the only one.
+
+## Overview
+
+Being able to take snapshots of an entire Accumulo system and roll back to
+snapshot would support many administrative use cases(outlined in
+{% ghi 1044 %}).  This design outlines one possible way to implement snapshots.
+
+The goal behavior is that a snapshot contains all data that was flushed or bulk
+imported before the snapshot operation started.  A snapshot may contain
+information written to a table while the snapshot operation is in progress.
+
+Each snapshot would be an composed of an tree of immutable files in DFS.  This
+tree would have the following three levels.
+
+ * **Root Node**: A snapshot of all ZK metadata. This would be accomplished by
+    copying all data in ZK into a file in DFS. This must be done correctly
+    with concurrent operations. If {% ghi 936%} is implemented, then a
+    snapshot of Zookeeper is a snapshot of everything needed.
+ * **Internal nodes**: The root node would point a version of the root tablet in
+    DFS. The root tablet snapshot would point to a version of the metadata
+    tablets in DFS.
+ * **Leaf nodes**:  The metadata nodes would point user table data in HDFS.
+    This would form a snapshot of the data in each table.
+
+The root snapshot node would also store info like per table config that is
+stored in zookeeper, in addition to pointing to other files.
+
+Snapshots would be stored as files in a `/accumulo/snapshots/` directory with a
+copy on every volume.
+
+The Accumulo GC would read all snapshots when deciding what files to delete. GC
+would need to read blip markers in snapshot.
+
+Storing all Accumulo snapshot data in DFS would work nicely with DFS snapshots.
+Taking an Accumulo snapshot followed by a DFS snapshot could avoid certain
+catastrophic administrative errors.  However, since Accumulo can work across
+multiple DFS instances there is no requirement to use DFS snapshots.
+
+## User operations
+
+Users would be able to do the following operations.
+
+ * Create a named snapshot.  The name must be unique.
+ * Interrogate snapshot information.
+   * List all snapshots.
+   * List the files used by snapshot.
+   * Print snapshot details (data version, accumulo version that created
+     snapshot, tables, table config, system config, fate ops, etc). Should
+     print this in such a way that it can be diffed using a text diff tool.
+   * Analyze the space used by snapshots.  Could output a table that shows exclusive and shared space.
+ * Restore Accumulo back to the state of a previous snapshot.
+ * Delete a snapshot
+
+## Implementation
+
+### Creating a snapshot
+
+A user API and related shell command would create snapshots.  The
+implementation behind this API would create a FATE operation to do the
+snapshot. The FATE op would do the following :
+
+ 1. get snapshot lock in ZK (prevents concurrent snapshot operations)
+ 1. ensure snapshot name is not in use
+ 1. pause changing props in ZK
+ 1. pause non-snapshot fate ops (let them finish current step, but do not
+    execute next step).   Temporarily stop accepting new FATE ops.
+ 1. pause Accumulo GC
+ 1. flush metadata table
+ 1. flush root table (probably need to to fix {% ghi 798 %})
+ 1. Create snapshot copying ZK to DFS (this is the snapshot assuming 
+    {% ghi 936 %} is done)
+ 1. Unpause everything. When the GC is unpaused, it should start fresh reading
+    all snapshots available.
+ 1. release snapshot lock
+
+A user could optionally flush some or all tables before taking a snapshot.
+
+More thought needs to be given to write ahead logs.  This design ignores them
+and only concerns itself with flushed data.
+
+Pausing FATE ops may not be needed.  More design work is needed in the general
+area of FATE ops. Ideally the snapshot operation would be extemely fast.
+Pausing FATE ops could be very slow.  The reason behind pausing is to get a
+consistent view of FATE and the root+metadata tables.
 
 Review comment:
   Wouldn't it simplify things drastically to take a snapshot of only data at rest?  Ignoring any currently running FATE operations that haven't completed.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services