You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by nd...@apache.org on 2017/12/01 04:41:53 UTC

[4/6] hbase git commit: updating docs from master

http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/backup_restore.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/backup_restore.adoc b/src/main/asciidoc/_chapters/backup_restore.adoc
new file mode 100644
index 0000000..a9dbcf5
--- /dev/null
+++ b/src/main/asciidoc/_chapters/backup_restore.adoc
@@ -0,0 +1,912 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[casestudies]]
+= Backup and Restore
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+[[br.overview]]
+== Overview
+
+Backup and restore is a standard operation provided by many databases. An effective backup and restore
+strategy helps ensure that users can recover data in case of unexpected failures. The HBase backup and restore
+feature helps ensure that enterprises using HBase as a canonical data repository can recover from catastrophic
+failures. Another important feature is the ability to restore the database to a particular
+point-in-time, commonly referred to as a snapshot.
+
+The HBase backup and restore feature provides the ability to create full backups and incremental backups on
+tables in an HBase cluster. The full backup is the foundation on which incremental backups are applied
+to build iterative snapshots. Incremental backups can be run on a schedule to capture changes over time,
+for example by using a Cron task. Incremental backups are more cost-effective than full backups because they only capture
+the changes since the last backup and they also enable administrators to restore the database to any prior incremental backup. Furthermore, the
+utilities also enable table-level data backup-and-recovery if you do not want to restore the entire dataset
+of the backup.
+
+The backup and restore feature supplements the HBase Replication feature. While HBase replication is ideal for
+creating "hot" copies of the data (where the replicated data is immediately available for query), the backup and
+restore feature is ideal for creating "cold" copies of data (where a manual step must be taken to restore the system).
+Previously, users only had the ability to create full backups via the ExportSnapshot functionality. The incremental
+backup implementation is the novel improvement over the previous "art" provided by ExportSnapshot.
+
+[[br.terminology]]
+== Terminology
+
+The backup and restore feature introduces new terminology which can be used to understand how control flows through the
+system.
+
+* _A backup_: A logical unit of data and metadata which can restore a table to its state at a specific point in time.
+* _Full backup_: a type of backup which wholly encapsulates the contents of the table at a point in time.
+* _Incremental backup_: a type of backup which contains the changes in a table since a full backup.
+* _Backup set_: A user-defined name which references one or more tables over which a backup can be executed.
+* _Backup ID_: A unique names which identifies one backup from the rest, e.g. `backupId_1467823988425`
+
+[[br.planning]]
+== Planning
+
+There are some common strategies which can be used to implement backup and restore in your environment. The following section
+shows how these strategies are implemented and identifies potential tradeoffs with each.
+
+WARNING: This backup and restore tools has not been tested on Transparent Data Encryption (TDE) enabled HDFS clusters.
+This is related to the open issue link:https://issues.apache.org/jira/browse/HBASE-16178[HBASE-16178].
+
+[[br.intracluster.backup]]
+=== Backup within a cluster
+
+This strategy stores the backups on the same cluster as where the backup was taken. This approach is only appropriate for testing
+as it does not provide any additional safety on top of what the software itself already provides.
+
+.Intra-Cluster Backup
+image::backup-intra-cluster.png[]
+
+[[br.dedicated.cluster.backup]]
+=== Backup using a dedicated cluster
+
+This strategy provides greater fault tolerance and provides a path towards disaster recovery. In this setting, you will
+store the backup on a separate HDFS cluster by supplying the backup destination cluster’s HDFS URL to the backup utility.
+You should consider backing up to a different physical location, such as a different data center.
+
+Typically, a backup-dedicated HDFS cluster uses a more economical hardware profile to save money.
+
+.Dedicated HDFS Cluster Backup
+image::backup-dedicated-cluster.png[]
+
+[[br.cloud.or.vendor.backup]]
+=== Backup to the Cloud or a storage vendor appliance
+
+Another approach to safeguarding HBase incremental backups is to store the data on provisioned, secure servers that belong
+to third-party vendors and that are located off-site. The vendor can be a public cloud provider or a storage vendor who uses
+a Hadoop-compatible file system, such as S3 and other HDFS-compatible destinations.
+
+.Backup to Cloud or Vendor Storage Solutions
+image::backup-cloud-appliance.png[]
+
+NOTE: The HBase backup utility does not support backup to multiple destinations. A workaround is to manually create copies
+of the backup files from HDFS or S3.
+
+[[br.initial.setup]]
+== First-time configuration steps
+
+This section contains the necessary configuration changes that must be made in order to use the backup and restore feature.
+As this feature makes significant use of YARN's MapReduce framework to parallelize these I/O heavy operations, configuration
+changes extend outside of just `hbase-site.xml`.
+
+=== Allow the "hbase" system user in YARN
+
+The YARN *container-executor.cfg* configuration file must have the following property setting: _allowed.system.users=hbase_. No spaces
+are allowed in entries of this configuration file.
+
+WARNING: Skipping this step will result in runtime errors when executing the first backup tasks.
+
+*Example of a valid container-executor.cfg file for backup and restore:*
+
+[source]
+----
+yarn.nodemanager.log-dirs=/var/log/hadoop/mapred
+yarn.nodemanager.linux-container-executor.group=yarn
+banned.users=hdfs,yarn,mapred,bin
+allowed.system.users=hbase
+min.user.id=500
+----
+
+=== HBase specific changes
+
+Add the following properties to hbase-site.xml and restart HBase if it is already running.
+
+NOTE: The ",..." is an ellipsis meant to imply that this is a comma-separated list of values, not literal text which should be added to hbase-site.xml.
+
+[source]
+----
+<property>
+  <name>hbase.backup.enable</name>
+  <value>true</value>
+</property>
+<property>
+  <name>hbase.master.logcleaner.plugins</name>
+  <value>org.apache.hadoop.hbase.backup.master.BackupLogCleaner,...</value>
+</property>
+<property>
+  <name>hbase.procedure.master.classes</name>
+  <value>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager,...</value>
+</property>
+<property>
+  <name>hbase.procedure.regionserver.classes</name>
+  <value>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager,...</value>
+</property>
+<property>
+  <name>hbase.coprocessor.region.classes</name>
+  <value>org.apache.hadoop.hbase.backup.BackupObserver,...</value>
+</property>
+<property>
+  <name>hbase.master.hfilecleaner.plugins</name>
+  <value>org.apache.hadoop.hbase.backup.BackupHFileCleaner,...</value>
+</property>
+----
+
+== Backup and Restore commands
+
+This covers the command-line utilities that administrators would run to create, restore, and merge backups. Tools to
+inspect details on specific backup sessions is covered in the next section, <<br.administration,Administration of Backup Images>>.
+
+Run the command `hbase backup help <command>` to access the online help that provides basic information about a command
+and its options. The below information is captured in this help message for each command.
+
+// hbase backup create
+
+[[br.creating.complete.backup]]
+### Creating a Backup Image
+
+[NOTE]
+====
+For HBase clusters also using Apache Phoenix: include the SQL system catalog tables in the backup. In the event that you
+need to restore the HBase backup, access to the system catalog tables enable you to resume Phoenix interoperability with the
+restored data.
+====
+
+The first step in running the backup and restore utilities is to perform a full backup and to store the data in a separate image
+from the source. At a minimum, you must do this to get a baseline before you can rely on incremental backups.
+
+Run the following command as HBase superuser:
+
+[source]
+----
+hbase backup create <type> <backup_path>
+----
+
+After the command finishes running, the console prints a SUCCESS or FAILURE status message. The SUCCESS message includes a _backup_ ID.
+The backup ID is the Unix time (also known as Epoch time) that the HBase master received the backup request from the client.
+
+[TIP]
+====
+Record the backup ID that appears at the end of a successful backup. In case the source cluster fails and you need to recover the
+dataset with a restore operation, having the backup ID readily available can save time.
+====
+
+[[br.create.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_type_::
+  The type of backup to execute: _full_ or _incremental_. As a reminder, an _incremental_ backup requires a _full_ backup to
+  already exist.
+
+_backup_path_::
+  The _backup_path_ argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are
+  are _hdfs:_, _webhdfs:_, _gpfs:_, and _s3fs:_.
+
+[[br.create.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+_-t <table_name[,table_name]>_::
+  A comma-separated list of tables to back up. If no tables are specified, all tables are backed up. No regular-expression or
+  wildcard support is present; all table names must be explicitly listed. See <<br.using.backup.sets,Backup Sets>> for more
+  information about peforming operations on collections of tables. Mutually exclusive with the _-s_ option; one of these
+  named options are required.
+
+_-s <backup_set_name>_::
+  Identify tables to backup based on a backup set. See <<br.using.backup.sets,Using Backup Sets>> for the purpose and usage
+  of backup sets. Mutually exclusive with the _-t_ option.
+
+_-w <number_workers>_::
+  (Optional) Specifies the number of parallel workers to copy data to backup destination. Backups are currently executed by MapReduce jobs
+  so this value corresponds to the number of Mappers that will be spawned by the job.
+
+_-b <bandwidth_per_worker>_::
+  (Optional) Specifies the bandwidth of each worker in MB per second.
+
+_-d_::
+  (Optional) Enables "DEBUG" mode which prints additional logging about the backup creation.
+
+_-q <name>_::
+  (Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option
+  is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance.
+
+[[br.usage.examples]]
+#### Example usage
+
+[source]
+----
+$ hbase backup create full hdfs://host5:8020/data/backup -t SALES2,SALES3 -w 3
+----
+
+This command creates a full backup image of two tables, SALES2 and SALES3, in the HDFS instance who NameNode is host5:8020
+in the path _/data/backup_. The _-w_ option specifies that no more than three parallel works complete the operation.
+
+// hbase backup restore
+
+[[br.restoring.backup]]
+### Restoring a Backup Image
+
+Run the following command as an HBase superuser. You can only restore a backup on a running HBase cluster because the data must be
+redistributed the RegionServers for the operation to complete successfully.
+
+[source]
+----
+hbase restore <backup_path> <backup_id>
+----
+
+[[br.restore.positional.args]]
+#### Positional Command-Line Arguments
+
+_backup_path_::
+  The _backup_path_ argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are
+  are _hdfs:_, _webhdfs:_, _gpfs:_, and _s3fs:_.
+
+_backup_id_::
+  The backup ID that uniquely identifies the backup image to be restored.
+
+
+[[br.restore.named.args]]
+#### Named Command-Line Arguments
+
+_-t <table_name[,table_name]>_::
+  A comma-separated list of tables to restore. See <<br.using.backup.sets,Backup Sets>> for more
+  information about peforming operations on collections of tables. Mutually exclusive with the _-s_ option; one of these
+  named options are required.
+
+_-s <backup_set_name>_::
+  Identify tables to backup based on a backup set. See <<br.using.backup.sets,Using Backup Sets>> for the purpose and usage
+  of backup sets. Mutually exclusive with the _-t_ option.
+
+_-q <name>_::
+  (Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option
+  is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance.
+
+_-c_::
+  (Optional) Perform a dry-run of the restore. The actions are checked, but not executed.
+
+_-m <target_tables>_::
+  (Optional) A comma-separated list of tables to restore into. If this option is not provided, the original table name is used. When
+  this option is provided, there must be an equal number of entries provided in the `-t` option.
+
+_-o_::
+  (Optional) Overwrites the target table for the restore if the table already exists.
+
+
+[[br.restore.usage]]
+#### Example of Usage
+
+[source]
+----
+hbase backup restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2
+----
+
+This command restores two tables of an incremental backup image. In this example:
+• `/tmp/backup_incremental` is the path to the directory containing the backup image.
+• `backupId_1467823988425` is the backup ID.
+• `mytable1` and `mytable2` are the names of tables in the backup image to be restored.
+
+// hbase backup merge
+
+[[br.merge.backup]]
+### Merging Incremental Backup Images
+
+This command can be used to merge two or more incremental backup images into a single incremental
+backup image. This can be used to consolidate multiple, small incremental backup images into a single
+larger incremental backup image. This command could be used to merge hourly incremental backups
+into a daily incremental backup image, or daily incremental backups into a weekly incremental backup.
+
+[source]
+----
+$ hbase backup merge <backup_ids>
+----
+
+[[br.merge.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_ids_::
+  A comma-separated list of incremental backup image IDs that are to be combined into a single image.
+
+[[br.merge.backup.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.merge.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup merge backupId_1467823988425,backupId_1467827588425
+----
+
+// hbase backup set
+
+[[br.using.backup.sets]]
+### Using Backup Sets
+
+Backup sets can ease the administration of HBase data backups and restores by reducing the amount of repetitive input
+of table names. You can group tables into a named backup set with the `hbase backup set add` command. You can then use
+the -set option to invoke the name of a backup set in the `hbase backup create` or `hbase backup restore` rather than list
+individually every table in the group. You can have multiple backup sets.
+
+NOTE: Note the differentiation between the `hbase backup set add` command and the _-set_ option. The `hbase backup set add`
+command must be run before using the `-set` option in a different command because backup sets must be named and defined
+before using backup sets as a shortcut.
+
+If you run the `hbase backup set add` command and specify a backup set name that does not yet exist on your system, a new set
+is created. If you run the command with the name of an existing backup set name, then the tables that you specify are added
+to the set.
+
+In this command, the backup set name is case-sensitive.
+
+NOTE: The metadata of backup sets are stored within HBase. If you do not have access to the original HBase cluster with the
+backup set metadata, then you must specify individual table names to restore the data.
+
+To create a backup set, run the following command as the HBase superuser:
+
+[source]
+----
+$ hbase backup set <subcommand> <backup_set_name> <tables>
+----
+
+[[br.set.subcommands]]
+#### Backup Set Subcommands
+
+The following list details subcommands of the hbase backup set command.
+
+NOTE: You must enter one (and no more than one) of the following subcommands after hbase backup set to complete an operation.
+Also, the backup set name is case-sensitive in the command-line utility.
+
+_add_::
+  Adds table[s] to a backup set. Specify a _backup_set_name_ value after this argument to create a backup set.
+
+_remove_::
+  Removes tables from the set. Specify the tables to remove in the tables argument.
+
+_list_::
+  Lists all backup sets.
+
+_describe_::
+  Displays a description of a backup set. The information includes whether the set has full
+  or incremental backups, start and end times of the backups, and a list of the tables in the set. This subcommand must precede
+  a valid value for the _backup_set_name_ value.
+
+_delete_::
+  Deletes a backup set. Enter the value for the _backup_set_name_ option directly after the `hbase backup set delete` command.
+
+[[br.set.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_set_name_::
+  Use to assign or invoke a backup set name. The backup set name must contain only printable characters and cannot have any spaces.
+
+_tables_::
+  List of tables (or a single table) to include in the backup set. Enter the table names as a comma-separated list. If no tables
+  are specified, all tables are included in the set.
+
+TIP: Maintain a log or other record of the case-sensitive backup set names and the corresponding tables in each set on a separate
+or remote cluster, backup strategy. This information can help you in case of failure on the primary cluster.
+
+[[br.set.usage]]
+#### Example of Usage
+
+[source]
+----
+$ hbase backup set add Q1Data TEAM3,TEAM_4
+----
+
+Depending on the environment, this command results in _one_ of the following actions:
+
+* If the `Q1Data` backup set does not exist, a backup set containing tables `TEAM_3` and `TEAM_4` is created.
+* If the `Q1Data` backup set exists already, the tables `TEAM_3` and `TEAM_4` are added to the `Q1Data` backup set.
+
+[[br.administration]]
+## Administration of Backup Images
+
+The `hbase backup` command has several subcommands that help with administering backup images as they accumulate. Most production
+environments require recurring backups, so it is necessary to have utilities to help manage the data of the backup repository.
+Some subcommands enable you to find information that can help identify backups that are relevant in a search for particular data.
+You can also delete backup images.
+
+The following list details each `hbase backup subcommand` that can help administer backups. Run the full command-subcommand line as
+the HBase superuser.
+
+// hbase backup progress
+
+[[br.managing.backup.progress]]
+### Managing Backup Progress
+
+You can monitor a running backup in another terminal session by running the _hbase backup progress_ command and specifying the backup ID as an argument.
+
+For example, run the following command as hbase superuser to view the progress of a backup
+
+[source]
+----
+$ hbase backup progress <backup_id>
+----
+
+[[br.progress.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive.
+
+[[br.progress.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.progress.example]]
+#### Example usage
+
+[source]
+----
+hbase backup progress backupId_1467823988425
+----
+
+// hbase backup history
+
+[[br.managing.backup.history]]
+### Managing Backup History
+
+This command displays a log of backup sessions. The information for each session includes backup ID, type (full or incremental), the tables
+in the backup, status, and start and end time. Specify the number of backup sessions to display with the optional -n argument.
+
+[source]
+----
+$ hbase backup history <backup_id>
+----
+
+[[br.history.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive.
+
+[[br.history.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+_-n <num_records>_::
+  (Optional) The maximum number of backup records (Default: 10).
+
+_-p <backup_root_path>_::
+  The full filesystem URI of where backup images are stored.
+
+_-s <backup_set_name>_::
+  The name of the backup set to obtain history for. Mutually exclusive with the _-t_ option.
+
+_-t_ <table_name>::
+  The name of table to obtain history for. Mutually exclusive with the _-s_ option.
+
+[[br.history.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup history
+$ hbase backup history -n 20
+$ hbase backup history -t WebIndexRecords
+----
+
+// hbase backup describe
+
+[[br.describe.backup]]
+### Describing a Backup Image
+
+This command can be used to obtain information about a specific backup image.
+
+[source]
+----
+$ hbase backup describe <backup_id>
+----
+
+[[br.describe.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  The ID of the backup image to describe.
+
+[[br.describe.backup.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.describe.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup describe backupId_1467823988425
+----
+
+// hbase backup delete
+
+[[br.delete.backup]]
+### Deleting a Backup Image
+
+This command can be used to delete a backup image which is no longer needed.
+
+[source]
+----
+$ hbase backup delete <backup_id>
+----
+
+[[br.delete.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  The ID to the backup image which should be deleted.
+
+[[br.delete.backup.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.delete.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup delete backupId_1467823988425
+----
+
+// hbase backup repair
+
+[[br.repair.backup]]
+### Backup Repair Command
+
+This command attempts to correct any inconsistencies in persisted backup metadata which exists as
+the result of software errors or unhandled failure scenarios. While the backup implementation tries
+to correct all errors on its own, this tool may be necessary in the cases where the system cannot
+automatically recover on its own.
+
+[source]
+----
+$ hbase backup repair
+----
+
+[[br.repair.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+None.
+
+[[br.repair.backup.named.cli.arguments]]
+### Named Command-Line Arguments
+
+None.
+
+[[br.repair.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup repair
+----
+
+[[br.backup.configuration]]
+## Configuration keys
+
+The backup and restore feature includes both required and optional configuration keys.
+
+### Required properties
+
+_hbase.backup.enable_: Controls whether or not the feature is enabled (Default: `false`). Set this value to `true`.
+
+_hbase.master.logcleaner.plugins_: A comma-separated list of classes invoked when cleaning logs in the HBase Master. Set
+this value to `org.apache.hadoop.hbase.backup.master.BackupLogCleaner` or append it to the current value.
+
+_hbase.procedure.master.classes_: A comma-separated list of classes invoked with the Procedure framework in the Master. Set
+this value to `org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager` or append it to the current value.
+
+_hbase.procedure.regionserver.classes_: A comma-separated list of classes invoked with the Procedure framework in the RegionServer.
+Set this value to `org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager` or append it to the current value.
+
+_hbase.coprocessor.region.classes_: A comma-separated list of RegionObservers deployed on tables. Set this value to
+`org.apache.hadoop.hbase.backup.BackupObserver` or append it to the current value.
+
+_hbase.master.hfilecleaner.plugins_: A comma-separated list of HFileCleaners deployed on the Master. Set this value
+to `org.apache.hadoop.hbase.backup.BackupHFileCleaner` or append it to the current value.
+
+### Optional properties
+
+_hbase.backup.system.ttl_: The time-to-live in seconds of data in the `hbase:backup` tables (default: forever). This property
+is only relevant prior to the creation of the `hbase:backup` table. Use the `alter` command in the HBase shell to modify the TTL
+when this table already exists. See the <<br.filesystem.growth.warning,below section>> for more details on the impact of this
+configuration property.
+
+_hbase.backup.attempts.max_: The number of attempts to perform when taking hbase table snapshots (default: 10).
+
+_hbase.backup.attempts.pause.ms_: The amount of time to wait between failed snapshot attempts in milliseconds (default: 10000).
+
+_hbase.backup.logroll.timeout.millis_: The amount of time (in milliseconds) to wait for RegionServers to execute a WAL rolling
+in the Master's procedure framework (default: 30000).
+
+[[br.best.practices]]
+## Best Practices
+
+### Formulate a restore strategy and test it.
+
+Before you rely on a backup and restore strategy for your production environment, identify how backups must be performed,
+and more importantly, how restores must be performed. Test the plan to ensure that it is workable.
+At a minimum, store backup data from a production cluster on a different cluster or server. To further safeguard the data,
+use a backup location that is at a different physical location.
+
+If you have a unrecoverable loss of data on your primary production cluster as a result of computer system issues, you may
+be able to restore the data from a different cluster or server at the same site. However, a disaster that destroys the whole
+site renders locally stored backups useless. Consider storing the backup data and necessary resources (both computing capacity
+and operator expertise) to restore the data at a site sufficiently remote from the production site. In the case of a catastrophe
+at the whole primary site (fire, earthquake, etc.), the remote backup site can be very valuable.
+
+### Secure a full backup image first.
+
+As a baseline, you must complete a full backup of HBase data at least once before you can rely on incremental backups. The full
+backup should be stored outside of the source cluster. To ensure complete dataset recovery, you must run the restore utility
+with the option to restore baseline full backup. The full backup is the foundation of your dataset. Incremental backup data
+is applied on top of the full backup during the restore operation to return you to the point in time when backup was last taken.
+
+### Define and use backup sets for groups of tables that are logical subsets of the entire dataset.
+
+You can group tables into an object called a backup set. A backup set can save time when you have a particular group of tables
+that you expect to repeatedly back up or restore.
+
+When you create a backup set, you type table names to include in the group. The backup set includes not only groups of related
+tables, but also retains the HBase backup metadata. Afterwards, you can invoke the backup set name to indicate what tables apply
+to the command execution instead of entering all the table names individually.
+
+### Document the backup and restore strategy, and ideally log information about each backup.
+
+Document the whole process so that the knowledge base can transfer to new administrators after employee turnover. As an extra
+safety precaution, also log the calendar date, time, and other relevant details about the data of each backup. This metadata
+can potentially help locate a particular dataset in case of source cluster failure or primary site disaster. Maintain duplicate
+copies of all documentation: one copy at the production cluster site and another at the backup location or wherever it can be
+accessed by an administrator remotely from the production cluster.
+
+[[br.s3.backup.scenario]]
+## Scenario: Safeguarding Application Datasets on Amazon S3
+
+This scenario describes how a hypothetical retail business uses backups to safeguard application data and then restore the dataset
+after failure.
+
+The HBase administration team uses backup sets to store data from a group of tables that have interrelated information for an
+application called green. In this example, one table contains transaction records and the other contains customer details. The
+two tables need to be backed up and be recoverable as a group.
+
+The admin team also wants to ensure daily backups occur automatically.
+
+.Tables Composing The Backup Set
+image::backup-app-components.png[]
+
+The following is an outline of the steps and examples of commands that are used to backup the data for the _green_ application and
+to recover the data later. All commands are run when logged in as HBase superuser.
+
+1. A backup set called _green_set_ is created as an alias for both the transactions table and the customer table. The backup set can
+be used for all operations to avoid typing each table name. The backup set name is case-sensitive and should be formed with only
+printable characters and without spaces.
+
+[source]
+----
+$ hbase backup set add green_set transactions
+$ hbase backup set add green_set customer
+----
+
+2. The first backup of green_set data must be a full backup. The following command example shows how credentials are passed to Amazon
+S3 and specifies the file system with the s3a: prefix.
+
+[source]
+----
+$ ACCESS_KEY=ABCDEFGHIJKLMNOPQRST
+$ SECRET_KEY=123456789abcdefghijklmnopqrstuvwxyzABCD
+$ sudo -u hbase hbase backup create full\
+  s3a://$ACCESS_KEY:SECRET_KEY@prodhbasebackups/backups -s green_set
+----
+
+3. Incremental backups should be run according to a schedule that ensures essential data recovery in the event of a catastrophe. At
+this retail company, the HBase admin team decides that automated daily backups secures the data sufficiently. The team decides that
+they can implement this by modifying an existing Cron job that is defined in `/etc/crontab`. Consequently, IT modifies the Cron job
+by adding the following line:
+
+[source]
+----
+@daily hbase hbase backup create incremental s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups -s green_set
+----
+
+4. A catastrophic IT incident disables the production cluster that the green application uses. An HBase system administrator of the
+backup cluster must restore the _green_set_ dataset to the point in time closest to the recovery objective.
+
+NOTE: If the administrator of the backup HBase cluster has the backup ID with relevant details in accessible records, the following
+search with the `hdfs dfs -ls` command and manually scanning the backup ID list can be bypassed. Consider continuously maintaining
+and protecting a detailed log of backup IDs outside the production cluster in your environment.
+
+The HBase administrator runs the following command on the directory where backups are stored to print the list of successful backup
+IDs on the console:
+
+`hdfs dfs -ls -t /prodhbasebackups/backups`
+
+5. The admin scans the list to see which backup was created at a date and time closest to the recovery objective. To do this, the
+admin converts the calendar timestamp of the recovery point in time to Unix time because backup IDs are uniquely identified with
+Unix time. The backup IDs are listed in reverse chronological order, meaning the most recent successful backup appears first.
+
+The admin notices that the following line in the command output corresponds with the _green_set_ backup that needs to be restored:
+
+`/prodhbasebackups/backups/backup_1467823988425`
+
+6. The admin restores green_set invoking the backup ID and the -overwrite option. The -overwrite option truncates all existing data
+in the destination and populates the tables with data from the backup dataset. Without this flag, the backup data is appended to the
+existing data in the destination. In this case, the admin decides to overwrite the data because it is corrupted.
+
+[source]
+----
+$ sudo -u hbase hbase restore -s green_set \
+  s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups backup_1467823988425 \ -overwrite
+----
+
+[[br.data.security]]
+## Security of Backup Data
+
+With this feature which makes copying data to remote locations, it's important to take a moment to clearly state the procedural
+concerns that exist around data security. Like the HBase replication feature, backup and restore provides the constructs to automatically
+copy data from within a corporate boundary to some system outside of that boundary. It is imperative when storing sensitive data that with backup and restore, much
+less any feature which extracts data from HBase, the locations to which data is being sent has undergone a security audit to ensure
+that only authenticated users are allowed to access that data.
+
+For example, with the above example of backing up data to S3, it is of the utmost importance that the proper permissions are assigned
+to the S3 bucket to ensure that only a minimum set of authorized users are allowed to access this data. Because the data is no longer
+being accessed via HBase, and its authentication and authorization controls, we must ensure that the filesystem storing that data is
+providing a comparable level of security. This is a manual step which users *must* implement on their own.
+
+[[br.technical.details]]
+## Technical Details of Incremental Backup and Restore
+
+HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore
+solutions, such as those that only used HBase Export and Import APIs. Incremental backups use Write Ahead Logs (WALs) to capture
+the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track
+the WALs that need to be in the backup.
+
+After the incremental backup image is created, the source backup files usually are on same node as the data source. A process similar
+to the DistCp (distributed copy) tool is used to move the source backup files to the target file systems. When a table restore operation
+starts, a two-step process is initiated. First, the full backup is restored from the full backup image. Second, all WAL files from
+incremental backups between the last full backup and the incremental backup being restored are converted to HFiles, which the HBase
+Bulk Load utility automatically imports as restored data in the table.
+
+You can only restore on a live HBase cluster because the data must be redistributed to complete the restore operation successfully.
+
+[[br.filesystem.growth.warning]]
+## A Warning on File System Growth
+
+As a reminder, incremental backups are implemented via retaining the write-ahead logs which HBase primarily uses for data durability.
+Thus, to ensure that all data needing to be included in a backup is still available in the system, the HBase backup and restore feature
+retains all write-ahead logs since the last backup until the next incremental backup is executed.
+
+Like HBase Snapshots, this can have an expectedly large impact on the HDFS usage of HBase for high volume tables. Take care in enabling
+and using the backup and restore feature, specifically with a mind to removing backup sessions when they are not actively being used.
+
+The only automated, upper-bound on retained write-ahead logs for backup and restore is based on the TTL of the `hbase:backup` system table which,
+as of the time this document is written, is infinite (backup table entries are never automatically deleted). This requires that administrators
+perform backups on a schedule whose frequency is relative to the amount of available space on HDFS (e.g. less available HDFS space requires
+more aggressive backup merges and deletions). As a reminder, the TTL can be altered on the `hbase:backup` table using the `alter` command
+in the HBase shell. Modifying the configuration property `hbase.backup.system.ttl` in hbase-site.xml after the system table exists has no effect.
+
+[[br.backup.capacity.planning]]
+## Capacity Planning
+
+When designing a distributed system deployment, it is critical that some basic mathmatical rigor is executed to ensure sufficient computational
+capacity is available given the data and software requirements of the system. For this feature, the availability of network capacity is the largest
+bottleneck when estimating the performance of some implementation of backup and restore. The second most costly function is the speed at which
+data can be read/written.
+
+### Full Backups
+
+To estimate the duration of a full backup, we have to understand the general actions which are invoked:
+
+* Write-ahead log roll on each RegionServer: ones to tens of seconds per RegionServer in parallel. Relative to the load on each RegionServer.
+* Take an HBase snapshot of the table(s): tens of seconds. Relative to the number of regions and files that comprise the table.
+* Export the snapshot to the destination: see below. Relative to the size of the data and the network bandwidth to the destination.
+
+[[br.export.snapshot.cost]]
+To approximate how long the final step will take, we have to make some assumptions on hardware. Be aware that these will *not* be accurate for your
+system -- these are numbers that your or your administrator know for your system. Let's say the speed of reading data from HDFS on a single node is
+capped at 80MB/s (across all Mappers that run on that host), a modern network interface controller (NIC) supports 10Gb/s, the top-of-rack switch can
+handle 40Gb/s, and the WAN between your clusters is 10Gb/s. This means that you can only ship data to your remote at a speed of 1.25GB/s -- meaning
+that 16 nodes (`1.25 * 1024 / 80 = 16`) participating in the ExportSnapshot should be able to fully saturate the link between clusters. With more
+nodes in the cluster, we can still saturate the network but at a lesser impact on any one node which helps ensure local SLAs are made. If the size
+of the snapshot is 10TB, this would full backup would take in the ballpark of 2.5 hours (`10 * 1024 / 1.25 / (60 * 60) = 2.23hrs`)
+
+As a general statement, it is very likely that the WAN bandwidth between your local cluster and the remote storage is the largest
+bottleneck to the speed of a full backup.
+
+When the concern is restricting the computational impact of backups to a "production system", the above formulas can be reused with the optional
+command-line arguments to `hbase backup create`: `-b`, `-w`, `-q`. The `-b` option defines the bandwidth at which each worker (Mapper) would
+write data. The `-w` argument limits the number of workers that would be spawned in the DistCp job. The `-q` allows the user to specify a YARN
+queue which can limit the specific nodes where the workers will be spawned -- this can quarantine the backup workers performing the copy to
+a set of non-critical nodes. Relating the `-b` and `-w` options to our earlier equations: `-b` would be used to restrict each node from reading
+data at the full 80MB/s and `-w` is used to limit the job from spawning 16 worker tasks.
+
+### Incremental Backup
+
+Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost.
+
+* Identify new write-ahead logs since last full or incremental backup: negligible. Apriori knowledge from the backup system table(s).
+* Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed of writing data. Relative to write speed of HDFS.
+* DistCp the HFiles to the destination: <<br.export.snapshot.cost,see above>>.
+
+For the second step, the dominating cost of this operation would be the re-writing the data (under the assumption that a majority of the
+data in the WAL is preserved). In this case, we can assume an aggregate write speed of 30MB/s per node. Continuing our 16-node cluster example,
+this would require approximately 15 minutes to perform this step for 50GB of data (50 * 1024 / 60 / 60 = 14.2). The amount of time to start the
+DistCp MapReduce job would likely dominate the actual time taken to copy the data (50 / 1.25 = 40 seconds) and can be ignored.
+
+[[br.limitations]]
+## Limitations of the Backup and Restore Utility
+
+*Serial backup operations*
+
+Backup operations cannot be run concurrently. An operation includes actions like create, delete, restore, and merge. Only one active backup session is supported. link:https://issues.apache.org/jira/browse/HBASE-16391[HBASE-16391]
+will introduce multiple-backup sessions support.
+
+*No means to cancel backups*
+
+Both backup and restore operations cannot be canceled. (link:https://issues.apache.org/jira/browse/HBASE-15997[HBASE-15997], link:https://issues.apache.org/jira/browse/HBASE-15998[HBASE-15998]).
+The workaround to cancel a backup would be to kill the client-side backup command (`control-C`), ensure all relevant MapReduce jobs have exited, and then
+run the `hbase backup repair` command to ensure the system backup metadata is consistent.
+
+*Backups can only be saved to a single location*
+
+Copying backup information to multiple locations is an exercise left to the user. link:https://issues.apache.org/jira/browse/HBASE-15476[HBASE-15476] will
+introduce the ability to specify multiple-backup destinations intrinsically.
+
+*HBase superuser access is required*
+
+Only an HBase superuser (e.g. hbase) is allowed to perform backup/restore, can pose a problem for shared HBase installations. Current mitigations would require
+coordination with system administrators to build and deploy a backup and restore strategy (link:https://issues.apache.org/jira/browse/HBASE-14138[HBASE-14138]).
+
+*Backup restoration is an online operation*
+
+To perform a restore from a backup, it requires that the HBase cluster is online as a caveat of the current implementation (link:https://issues.apache.org/jira/browse/HBASE-16573[HBASE-16573]).
+
+*Some operations may fail and require re-run*
+
+The HBase backup feature is primarily client driven. While there is the standard HBase retry logic built into the HBase Connection, persistent errors in executing operations
+may propagate back to the client (e.g. snapshot failure due to region splits). The backup implementation should be moved from client-side into the ProcedureV2 framework
+in the future which would provide additional robustness around transient/retryable failures. The `hbase backup repair` command is meant to correct states which the system
+cannot automatically detect and recover from.
+
+*Avoidance of declaration of public API*
+
+While the Java API to interact with this feature exists and its implementation is separated from an interface, insufficient rigor has been applied to determine if
+it is exactly what we intend to ship to users. As such, it is marked as for a `Private` audience with the expectation that, as users begin to try the feature, there
+will be modifications that would necessitate breaking compatibility (link:https://issues.apache.org/jira/browse/HBASE-17517[HBASE-17517]).
+
+*Lack of global metrics for backup and restore*
+
+Individual backup and restore operations contain metrics about the amount of work the operation included, but there is no centralized location (e.g. the Master UI)
+which present information for consumption (link:https://issues.apache.org/jira/browse/HBASE-16565[HBASE-16565]).

http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/community.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/community.adoc b/src/main/asciidoc/_chapters/community.adoc
index f63d597..d141dbf 100644
--- a/src/main/asciidoc/_chapters/community.adoc
+++ b/src/main/asciidoc/_chapters/community.adoc
@@ -47,9 +47,9 @@ The below policy is something we put in place 09/2012.
 It is a suggested policy rather than a hard requirement.
 We want to try it first to see if it works before we cast it in stone.
 
-Apache HBase is made of link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components].
+Apache HBase is made of link:https://issues.apache.org/jira/projects/HBASE?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page[components].
 Components have one or more <<owner,OWNER>>s.
-See the 'Description' field on the link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components]        JIRA page for who the current owners are by component.
+See the 'Description' field on the link:https://issues.apache.org/jira/projects/HBASE?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page[components] JIRA page for who the current owners are by component.
 
 Patches that fit within the scope of a single Apache HBase component require, at least, a +1 by one of the component's owners before commit.
 If owners are absent -- busy or otherwise -- two +1s by non-owners will suffice.
@@ -88,7 +88,7 @@ We also are currently in violation of this basic tenet -- replication at least k
 [[owner]]
 .Component Owner/Lieutenant
 
-Component owners are listed in the description field on this Apache HBase JIRA link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components]        page.
+Component owners are listed in the description field on this Apache HBase JIRA link:https://issues.apache.org/jira/projects/HBASE?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page[components] page.
 The owners are listed in the 'Description' field rather than in the 'Component Lead' field because the latter only allows us list one individual whereas it is encouraged that components have multiple owners.
 
 Owners or component lieutenants are volunteers who are (usually, but not necessarily) expert in their component domain and may have an agenda on how they think their Apache HBase component should evolve.

http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/compression.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/compression.adoc b/src/main/asciidoc/_chapters/compression.adoc
index e5b9b8f..23ceeaf 100644
--- a/src/main/asciidoc/_chapters/compression.adoc
+++ b/src/main/asciidoc/_chapters/compression.adoc
@@ -115,12 +115,7 @@ The data format is nearly identical to Diff encoding, so there is not an image t
 Prefix Tree::
   Prefix tree encoding was introduced as an experimental feature in HBase 0.96.
   It provides similar memory savings to the Prefix, Diff, and Fast Diff encoder, but provides faster random access at a cost of slower encoding speed.
-+
-Prefix Tree may be appropriate for applications that have high block cache hit ratios. It introduces new 'tree' fields for the row and column.
-The row tree field contains a list of offsets/references corresponding to the cells in that row. This allows for a good deal of compression.
-For more details about Prefix Tree encoding, see link:https://issues.apache.org/jira/browse/HBASE-4676[HBASE-4676].
-+
-It is difficult to graphically illustrate a prefix tree, so no image is included. See the Wikipedia article for link:http://en.wikipedia.org/wiki/Trie[Trie] for more general information about this data structure.
+  It was removed in hbase-2.0.0. It was a good idea but little uptake. If interested in reviving this effort, write the hbase dev list.
 
 [[data.block.encoding.types]]
 === Which Compressor or Data Block Encoder To Use
@@ -267,8 +262,7 @@ See <<brand.new.compressor,brand.new.compressor>>).
 .Install LZO Support
 
 HBase cannot ship with LZO because of incompatibility between HBase, which uses an Apache Software License (ASL) and LZO, which uses a GPL license.
-See the link:http://wiki.apache.org/hadoop/UsingLzoCompression[Using LZO
-              Compression] wiki page for information on configuring LZO support for HBase.
+See the link:https://github.com/twitter/hadoop-lzo/blob/master/README.md[Hadoop-LZO at Twitter] for information on configuring LZO support for HBase.
 
 If you depend upon LZO compression, consider configuring your RegionServers to fail to start if LZO is not available.
 See <<hbase.regionserver.codecs,hbase.regionserver.codecs>>.

http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc
index bf14d11..7218a42 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -43,7 +43,7 @@ _backup-masters_::
 
 _hadoop-metrics2-hbase.properties_::
   Used to connect HBase Hadoop's Metrics2 framework.
-  See the link:http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2[Hadoop Wiki entry] for more information on Metrics2.
+  See the link:https://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2[Hadoop Wiki entry] for more information on Metrics2.
   Contains only commented-out examples by default.
 
 _hbase-env.cmd_ and _hbase-env.sh_::
@@ -124,7 +124,7 @@ NOTE: You must set `JAVA_HOME` on each node of your cluster. _hbase-env.sh_ prov
 [[os]]
 .Operating System Utilities
 ssh::
-  HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
+  HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:https://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
 
 DNS::
   HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving must work in versions of HBase previous to 0.92.0. The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker] tool can be used to verify DNS is working correctly on the cluster. The project `README` file provides detailed instructions on usage.
@@ -180,13 +180,13 @@ Windows::
 
 
 [[hadoop]]
-=== link:http://hadoop.apache.org[Hadoop](((Hadoop)))
+=== link:https://hadoop.apache.org[Hadoop](((Hadoop)))
 
 The following table summarizes the versions of Hadoop supported with each version of HBase.
 Based on the version of HBase, you should select the most appropriate version of Hadoop.
 You can use Apache Hadoop, or a vendor's distribution of Hadoop.
 No distinction is made here.
-See link:http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support[the Hadoop wiki] for information about vendors of Hadoop.
+See link:https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support[the Hadoop wiki] for information about vendors of Hadoop.
 
 .Hadoop 2.x is recommended.
 [TIP]
@@ -220,6 +220,7 @@ Use the following legend to interpret this table:
 |Hadoop-2.7.0 | X | X | X | X
 |Hadoop-2.7.1+ | NT | S | S | S
 |Hadoop-2.8.0 | X | X | X | X
+|Hadoop-2.8.1 | X | X | X | X
 |Hadoop-3.0.0-alphax | NT | NT | NT | NT
 |===
 
@@ -250,7 +251,7 @@ Hadoop version 2.7.0 is not tested or supported as the Hadoop PMC has explicitly
 .Hadoop 2.8.x
 [TIP]
 ====
-Hadoop version 2.8.0 is not tested or supported as the Hadoop PMC has explicitly labeled that release as not being stable. (reference the link:https://s.apache.org/hadoop-2.8.0-announcement[announcement of Apache Hadoop 2.8.0].)
+Hadoop version 2.8.0 and 2.8.1 are not tested or supported as the Hadoop PMC has explicitly labeled that releases as not being stable. (reference the link:https://s.apache.org/hadoop-2.8.0-announcement[announcement of Apache Hadoop 2.8.0] and link:https://s.apache.org/hadoop-2.8.1-announcement[announcement of Apache Hadoop 2.8.1].)
 ====
 
 .Replace the Hadoop Bundled With HBase!
@@ -356,7 +357,7 @@ Distributed mode can be subdivided into distributed but all daemons run on a sin
 The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
 
 Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
-See the Hadoop link:http://hadoop.apache.org/docs/current/[documentation] for how to set up HDFS.
+See the Hadoop link:https://hadoop.apache.org/docs/current/[documentation] for how to set up HDFS.
 A good walk-through for setting up HDFS on Hadoop 2 can be found at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide.
 
 [[pseudo]]
@@ -546,19 +547,14 @@ Usually this ensemble location is kept out in the _hbase-site.xml_ and is picked
 
 If you are configuring an IDE to run an HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
 
-Minimally, an HBase client needs several libraries in its `CLASSPATH` when connecting to a cluster, including:
-[source]
+Minimally, an HBase client needs hbase-client module in its dependencies when connecting to a cluster:
+[source,xml]
 ----
-
-commons-configuration (commons-configuration-1.6.jar)
-commons-lang (commons-lang-2.5.jar)
-commons-logging (commons-logging-1.1.1.jar)
-hadoop-core (hadoop-core-1.0.0.jar)
-hbase (hbase-0.92.0.jar)
-log4j (log4j-1.2.16.jar)
-slf4j-api (slf4j-api-1.5.8.jar)
-slf4j-log4j (slf4j-log4j12-1.5.8.jar)
-zookeeper (zookeeper-3.4.2.jar)
+<dependency>
+  <groupId>org.apache.hbase</groupId>
+  <artifactId>hbase-client</artifactId>
+  <version>1.2.4</version>
+</dependency>
 ----
 
 A basic example _hbase-site.xml_ for client only may look as follows:
@@ -579,7 +575,7 @@ A basic example _hbase-site.xml_ for client only may look as follows:
 [[java.client.config]]
 ==== Java client configuration
 
-The configuration used by a Java client is kept in an link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
+The configuration used by a Java client is kept in an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
 
 The factory method on HBaseConfiguration, `HBaseConfiguration.create();`, on invocation, will read in the content of the first _hbase-site.xml_ found on the client's `CLASSPATH`, if one is present (Invocation will also factor in any _hbase-default.xml_ found; an _hbase-default.xml_ ships inside the _hbase.X.X.X.jar_). It is also possible to specify configuration directly without having to read from a _hbase-site.xml_.
 For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows:
@@ -590,7 +586,7 @@ Configuration config = HBaseConfiguration.create();
 config.set("hbase.zookeeper.quorum", "localhost");  // Here we are running zookeeper locally
 ----
 
-If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the _hbase-site.xml_ file). This populated `Configuration` instance can then be passed to an link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], and so on.
+If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the _hbase-site.xml_ file). This populated `Configuration` instance can then be passed to an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], and so on.
 
 [[example_config]]
 == Example Configurations
@@ -788,7 +784,7 @@ Manual splitting can mitigate region creation and movement under load.
 It also makes it so region boundaries are known and invariant (if you disable region splitting). If you use manual splits, it is easier doing staggered, time-based major compactions to spread out your network IO load.
 
 .Disable Automatic Splitting
-To disable automatic splitting, set `hbase.hregion.max.filesize` to a very large value, such as `100 GB` It is not recommended to set it to its absolute maximum value of `Long.MAX_VALUE`.
+To disable automatic splitting, you can set region split policy in either cluster configuration or table configuration to be `org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy`
 
 .Automatic Splitting Is Recommended
 [NOTE]
@@ -824,7 +820,7 @@ See the entry for `hbase.hregion.majorcompaction` in the <<compaction.parameters
 ====
 Major compactions are absolutely necessary for StoreFile clean-up.
 Do not disable them altogether.
-You can run major compactions manually via the HBase shell or via the http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact(org.apache.hadoop.hbase.TableName)[Admin API].
+You can run major compactions manually via the HBase shell or via the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin API].
 ====
 
 For more information about compactions and the compaction file selection process, see <<compaction,compaction>>

http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/cp.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/cp.adoc b/src/main/asciidoc/_chapters/cp.adoc
index 2f5267f..abe334c 100644
--- a/src/main/asciidoc/_chapters/cp.adoc
+++ b/src/main/asciidoc/_chapters/cp.adoc
@@ -61,7 +61,7 @@ coprocessor can severely degrade cluster performance and stability.
 
 In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL
 query. In order to fetch only the relevant data, you filter it using a HBase
-link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
+link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
 , whereas in an RDBMS you use a `WHERE` predicate.
 
 After fetching the data, you perform computations on it. This paradigm works well
@@ -121,8 +121,8 @@ package.
 
 Observer coprocessors are triggered either before or after a specific event occurs.
 Observers that happen before an event use methods that start with a `pre` prefix,
-such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`prePut`]. Observers that happen just after an event override methods that start
-with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`postPut`].
+such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`prePut`]. Observers that happen just after an event override methods that start
+with a `post` prefix, such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`postPut`].
 
 
 ==== Use Cases for Observer Coprocessors
@@ -139,7 +139,7 @@ Referential Integrity::
 
 Secondary Indexes::
   You can use a coprocessor to maintain secondary indexes. For more information, see
-  link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
+  link:https://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
 
 
 ==== Types of Observer Coprocessor
@@ -163,7 +163,7 @@ MasterObserver::
 WalObserver::
   A WalObserver allows you to observe events related to writes to the Write-Ahead
   Log (WAL). See
-  link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
+  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
 
 <<cp_example,Examples>> provides working examples of observer coprocessors.
 
@@ -178,7 +178,7 @@ average or summation for an entire table which spans hundreds of regions.
 
 In contrast to observer coprocessors, where your code is run transparently, endpoint
 coprocessors must be explicitly invoked using the
-link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService%28java.lang.Class,%20byte%5B%5D,%20byte%5B%5D,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call%29[CoprocessorService()]
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService-java.lang.Class-byte:A-byte:A-org.apache.hadoop.hbase.client.coprocessor.Batch.Call-[CoprocessorService()]
 method available in
 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table]
 or

http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc b/src/main/asciidoc/_chapters/datamodel.adoc
index da4143a..3674566 100644
--- a/src/main/asciidoc/_chapters/datamodel.adoc
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -67,7 +67,7 @@ Timestamp::
 [[conceptual.view]]
 == Conceptual View
 
-You can read a very understandable explanation of the HBase data model in the blog post link:http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable[Understanding HBase and BigTable] by Jim R. Wilson.
+You can read a very understandable explanation of the HBase data model in the blog post link:http://jimbojw.com/#understanding%20hbase[Understanding HBase and BigTable] by Jim R. Wilson.
 Another good explanation is available in the PDF link:http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf[Introduction to Basic Schema Design] by Amandeep Khurana.
 
 It may help to read different perspectives to get a solid understanding of HBase schema design.
@@ -173,7 +173,7 @@ This abstraction lays the groundwork for upcoming multi-tenancy related features
 
 * Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (i.e. regions, tables) a namespace can consume.
 * Namespace Security Administration (link:https://issues.apache.org/jira/browse/HBASE-9206[HBASE-9206]) - Provide another level of security administration for tenants.
-* Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a course level of isolation.
+* Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a coarse level of isolation.
 
 [[namespace_creation]]
 === Namespace management
@@ -270,21 +270,21 @@ Cell content is uninterpreted bytes
 == Data Model Operations
 
 The four primary data model operations are Get, Put, Scan, and Delete.
-Operations are applied via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances.
+Operations are applied via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances.
 
 === Get
 
-link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
-Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)[Table.get].
+link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
+Gets are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get-org.apache.hadoop.hbase.client.Get-[Table.get]
 
 === Put
 
-link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,%20java.lang.Object%5B%5D)[Table.batch] (non-writeBuffer).
+link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put-org.apache.hadoop.hbase.client.Put-[Table.put] (non-writeBuffer) or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[Table.batch] (non-writeBuffer)
 
 [[scan]]
 === Scans
 
-link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes.
+link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes.
 
 The following is an example of a Scan on a Table instance.
 Assume that a table is populated with rows with keys "row1", "row2", "row3", and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan instance to return the rows beginning with "row".
@@ -311,12 +311,12 @@ try {
 }
 ----
 
-Note that generally the easiest way to specify a specific stop point for a scan is by using the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class.
+Note that generally the easiest way to specify a specific stop point for a scan is by using the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class.
 
 === Delete
 
-link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
-Deletes are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)[Table.delete].
+link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
+Deletes are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[Table.delete].
 
 HBase does not modify data in place, and so deletes are handled by creating new markers called _tombstones_.
 These tombstones, along with the dead values, are cleaned up on major compactions.
@@ -341,7 +341,7 @@ In particular:
 * It is OK to write cells in a non-increasing version order.
 
 Below we describe how the version dimension in HBase currently works.
-See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for discussion of HBase versions. link:http://outerthought.org/blog/417-ot.html[Bending time in HBase] makes for a good read on the version, or time, dimension in HBase.
+See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for discussion of HBase versions. link:https://www.ngdata.com/bending-time-in-hbase/[Bending time in HBase] makes for a good read on the version, or time, dimension in HBase.
 It has more detail on versioning than is provided here.
 As of this writing, the limitation _Overwriting values at existing timestamps_ mentioned in the article no longer holds in HBase.
 This section is basically a synopsis of this article by Bruno Dumon.
@@ -355,7 +355,7 @@ Prior to HBase 0.96, the default number of versions kept was `3`, but in 0.96 an
 .Modify the Maximum Number of Versions for a Column Family
 ====
 This example uses HBase Shell to keep a maximum of 5 versions of all columns in column family `f1`.
-You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
+You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
 
 ----
 hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
@@ -367,7 +367,7 @@ hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
 You can also specify the minimum number of versions to store per column family.
 By default, this is set to 0, which means the feature is disabled.
 The following example sets the minimum number of versions on all columns in column family `f1` to `2`, via HBase Shell.
-You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
+You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
 
 ----
 hbase> alter ‘t1′, NAME => ‘f1′, MIN_VERSIONS => 2
@@ -385,12 +385,12 @@ In this section we look at the behavior of the version dimension for each of the
 ==== Get/Scan
 
 Gets are implemented on top of Scans.
-The below discussion of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
+The below discussion of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
 
 By default, i.e. if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways:
 
-* to return more than one version, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()[Get.setMaxVersions()]
-* to return versions other than the latest, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange(long,%20long)[Get.setTimeRange()]
+* to return more than one version, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions--[Get.setMaxVersions()]
+* to return versions other than the latest, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange-long-long-[Get.setTimeRange()]
 +
 To retrieve the latest version that is less than or equal to a given value, thus giving the 'latest' state of the record at a certain point in time, just use a range from 0 to the desired version and set the max versions to 1.
 
@@ -525,7 +525,7 @@ _...create three cell versions at t1, t2 and t3, with a maximum-versions
     setting of 2. So when getting all versions, only the values at t2 and t3 will be
     returned. But if you delete the version at t2 or t3, the one at t1 will appear again.
     Obviously, once a major compaction has run, such behavior will not be the case
-    anymore..._ (See _Garbage Collection_ in link:http://outerthought.org/blog/417-ot.html[Bending time in HBase].)
+    anymore..._ (See _Garbage Collection_ in link:https://www.ngdata.com/bending-time-in-hbase/[Bending time in HBase].)
 
 [[dm.sort]]
 == Sort Order