You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by el...@apache.org on 2017/04/21 02:47:05 UTC

[1/6] accumulo git commit: ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual

Repository: accumulo
Updated Branches:
  refs/heads/1.7 6b2e430dc -> ddc6203ad
  refs/heads/1.8 cf6c0ff09 -> bca75d356
  refs/heads/master 05496627c -> 92c45a896


ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual

Signed-off-by: Josh Elser <el...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/ddc6203a
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/ddc6203a
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/ddc6203a

Branch: refs/heads/1.7
Commit: ddc6203ad0e5ca9bbe553b5bad1f2498af634a7e
Parents: 6b2e430
Author: Sean Busbey <bu...@cloudera.com>
Authored: Thu Apr 20 22:39:56 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Apr 20 22:42:42 2017 -0400

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 30 +++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/ddc6203a/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt b/docs/src/main/asciidoc/chapters/troubleshooting.txt
index cd2923c..359ed67 100644
--- a/docs/src/main/asciidoc/chapters/troubleshooting.txt
+++ b/docs/src/main/asciidoc/chapters/troubleshooting.txt
@@ -666,6 +666,35 @@ original and the new instances, but it can serve as a reference.
 rfiles to allow references in the metadata table and in the tablet servers to be
 resolved. Rebuild the metadata table if the corrupt files are metadata files.
 
+*Write-Ahead Log(WAL) File Corruption*
+
+In certain versions of Accumulo, a corrupt WAL file (caused by HDFS corruption
+or a bug in Accumulo that created the file) can block the successful recovery
+of one to many Tablets. Accumulo can be stuck in a loop trying to recover the
+WAL file, never being able to succeed.
+
+In the cases where the WAL file's original contents are unrecoverable or some degree
+of data loss is acceptable (beware if the WAL file contains updates to the Accumulo
+metadat table!), the following process can be followed to create an valid, empty
+WAL file. Run the following commands as the Accumulo unix user (to ensure that
+the proper file permissions in HDFS)
+
+  $ echo -n -e '--- Log File Header (v2) ---\x00\x00\x00\x00' > empty.wal
+
+The above creates a file with the text "--- Log File Header (v2) ---" and then
+four bytes. You should verify the contents of the file with a hexdump tool.
+
+Then, place this empty WAL in HDFS and then replace the corrupt WAL file in HDFS
+with the empty WAL.
+
+  $ hdfs dfs -moveFromLocal empty.wal /user/accumulo/empty.wal
+  $ hdfs dfs -mv /user/accumulo/empty.wal /accumulo/wal/tserver-4.example.com+10011/26abec5b-63e7-40dd-9fa1-b8ad2436606e
+
+After the corrupt WAL file has been replaced, the system should automatically recover.
+It may be necessary to restart the Accumulo Master process as an exponential
+backup policy is used which could lead to a long wait before Accumulo will
+try to re-load the WAL file.
+
 [[zookeeper_failure]]
 #### ZooKeeper Failure
 *Q*: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance?
@@ -765,4 +794,3 @@ For example, if you see multiple files with +M+ prefixes, the tablet is, or was,
 maximum file limit, so it began merging memory updates with files to keep the file count reasonable.  This
 slows down ingest performance, so knowing there are many files like this tells you that the system
 is struggling to keep up with ingest vs the compaction strategy which reduces the number of files.
-


[6/6] accumulo git commit: Merge branch '1.8'

Posted by el...@apache.org.
Merge branch '1.8'


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/92c45a89
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/92c45a89
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/92c45a89

Branch: refs/heads/master
Commit: 92c45a8960768358f203892b9c268c3cf42db427
Parents: 0549662 bca75d3
Author: Josh Elser <el...@apache.org>
Authored: Thu Apr 20 22:46:54 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Apr 20 22:46:54 2017 -0400

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 29 ++++++++++++++++++++
 1 file changed, 29 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/92c45a89/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------


[4/6] accumulo git commit: Merge branch '1.7' into 1.8

Posted by el...@apache.org.
Merge branch '1.7' into 1.8


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/bca75d35
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/bca75d35
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/bca75d35

Branch: refs/heads/master
Commit: bca75d356cbec38a21b8ca7926ca356abfae0c70
Parents: cf6c0ff ddc6203
Author: Josh Elser <el...@apache.org>
Authored: Thu Apr 20 22:44:00 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Apr 20 22:44:00 2017 -0400

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 29 ++++++++++++++++++++
 1 file changed, 29 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/bca75d35/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------


[5/6] accumulo git commit: Merge branch '1.7' into 1.8

Posted by el...@apache.org.
Merge branch '1.7' into 1.8


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/bca75d35
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/bca75d35
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/bca75d35

Branch: refs/heads/1.8
Commit: bca75d356cbec38a21b8ca7926ca356abfae0c70
Parents: cf6c0ff ddc6203
Author: Josh Elser <el...@apache.org>
Authored: Thu Apr 20 22:44:00 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Apr 20 22:44:00 2017 -0400

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 29 ++++++++++++++++++++
 1 file changed, 29 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/bca75d35/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------


[3/6] accumulo git commit: ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual

Posted by el...@apache.org.
ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual

Signed-off-by: Josh Elser <el...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/ddc6203a
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/ddc6203a
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/ddc6203a

Branch: refs/heads/master
Commit: ddc6203ad0e5ca9bbe553b5bad1f2498af634a7e
Parents: 6b2e430
Author: Sean Busbey <bu...@cloudera.com>
Authored: Thu Apr 20 22:39:56 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Apr 20 22:42:42 2017 -0400

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 30 +++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/ddc6203a/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt b/docs/src/main/asciidoc/chapters/troubleshooting.txt
index cd2923c..359ed67 100644
--- a/docs/src/main/asciidoc/chapters/troubleshooting.txt
+++ b/docs/src/main/asciidoc/chapters/troubleshooting.txt
@@ -666,6 +666,35 @@ original and the new instances, but it can serve as a reference.
 rfiles to allow references in the metadata table and in the tablet servers to be
 resolved. Rebuild the metadata table if the corrupt files are metadata files.
 
+*Write-Ahead Log(WAL) File Corruption*
+
+In certain versions of Accumulo, a corrupt WAL file (caused by HDFS corruption
+or a bug in Accumulo that created the file) can block the successful recovery
+of one to many Tablets. Accumulo can be stuck in a loop trying to recover the
+WAL file, never being able to succeed.
+
+In the cases where the WAL file's original contents are unrecoverable or some degree
+of data loss is acceptable (beware if the WAL file contains updates to the Accumulo
+metadat table!), the following process can be followed to create an valid, empty
+WAL file. Run the following commands as the Accumulo unix user (to ensure that
+the proper file permissions in HDFS)
+
+  $ echo -n -e '--- Log File Header (v2) ---\x00\x00\x00\x00' > empty.wal
+
+The above creates a file with the text "--- Log File Header (v2) ---" and then
+four bytes. You should verify the contents of the file with a hexdump tool.
+
+Then, place this empty WAL in HDFS and then replace the corrupt WAL file in HDFS
+with the empty WAL.
+
+  $ hdfs dfs -moveFromLocal empty.wal /user/accumulo/empty.wal
+  $ hdfs dfs -mv /user/accumulo/empty.wal /accumulo/wal/tserver-4.example.com+10011/26abec5b-63e7-40dd-9fa1-b8ad2436606e
+
+After the corrupt WAL file has been replaced, the system should automatically recover.
+It may be necessary to restart the Accumulo Master process as an exponential
+backup policy is used which could lead to a long wait before Accumulo will
+try to re-load the WAL file.
+
 [[zookeeper_failure]]
 #### ZooKeeper Failure
 *Q*: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance?
@@ -765,4 +794,3 @@ For example, if you see multiple files with +M+ prefixes, the tablet is, or was,
 maximum file limit, so it began merging memory updates with files to keep the file count reasonable.  This
 slows down ingest performance, so knowing there are many files like this tells you that the system
 is struggling to keep up with ingest vs the compaction strategy which reduces the number of files.
-


[2/6] accumulo git commit: ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual

Posted by el...@apache.org.
ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual

Signed-off-by: Josh Elser <el...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/ddc6203a
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/ddc6203a
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/ddc6203a

Branch: refs/heads/1.8
Commit: ddc6203ad0e5ca9bbe553b5bad1f2498af634a7e
Parents: 6b2e430
Author: Sean Busbey <bu...@cloudera.com>
Authored: Thu Apr 20 22:39:56 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Apr 20 22:42:42 2017 -0400

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 30 +++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/ddc6203a/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt b/docs/src/main/asciidoc/chapters/troubleshooting.txt
index cd2923c..359ed67 100644
--- a/docs/src/main/asciidoc/chapters/troubleshooting.txt
+++ b/docs/src/main/asciidoc/chapters/troubleshooting.txt
@@ -666,6 +666,35 @@ original and the new instances, but it can serve as a reference.
 rfiles to allow references in the metadata table and in the tablet servers to be
 resolved. Rebuild the metadata table if the corrupt files are metadata files.
 
+*Write-Ahead Log(WAL) File Corruption*
+
+In certain versions of Accumulo, a corrupt WAL file (caused by HDFS corruption
+or a bug in Accumulo that created the file) can block the successful recovery
+of one to many Tablets. Accumulo can be stuck in a loop trying to recover the
+WAL file, never being able to succeed.
+
+In the cases where the WAL file's original contents are unrecoverable or some degree
+of data loss is acceptable (beware if the WAL file contains updates to the Accumulo
+metadat table!), the following process can be followed to create an valid, empty
+WAL file. Run the following commands as the Accumulo unix user (to ensure that
+the proper file permissions in HDFS)
+
+  $ echo -n -e '--- Log File Header (v2) ---\x00\x00\x00\x00' > empty.wal
+
+The above creates a file with the text "--- Log File Header (v2) ---" and then
+four bytes. You should verify the contents of the file with a hexdump tool.
+
+Then, place this empty WAL in HDFS and then replace the corrupt WAL file in HDFS
+with the empty WAL.
+
+  $ hdfs dfs -moveFromLocal empty.wal /user/accumulo/empty.wal
+  $ hdfs dfs -mv /user/accumulo/empty.wal /accumulo/wal/tserver-4.example.com+10011/26abec5b-63e7-40dd-9fa1-b8ad2436606e
+
+After the corrupt WAL file has been replaced, the system should automatically recover.
+It may be necessary to restart the Accumulo Master process as an exponential
+backup policy is used which could lead to a long wait before Accumulo will
+try to re-load the WAL file.
+
 [[zookeeper_failure]]
 #### ZooKeeper Failure
 *Q*: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance?
@@ -765,4 +794,3 @@ For example, if you see multiple files with +M+ prefixes, the tablet is, or was,
 maximum file limit, so it began merging memory updates with files to keep the file count reasonable.  This
 slows down ingest performance, so knowing there are many files like this tells you that the system
 is struggling to keep up with ingest vs the compaction strategy which reduces the number of files.
-