You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by sa...@apache.org on 2021/04/30 20:42:31 UTC

[hbase] branch master updated: HBASE-25816: Improve the documentation of Architecture section of reference guide (#3211)

This is an automated email from the ASF dual-hosted git repository.

sakthi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hbase.git


The following commit(s) were added to refs/heads/master by this push:
     new 5d42f58  HBASE-25816: Improve the documentation of Architecture section of reference guide (#3211)
5d42f58 is described below

commit 5d42f58ff604497b083e8e2dae0347f1fb3618fa
Author: Kota-SH <sh...@gmail.com>
AuthorDate: Fri Apr 30 13:42:06 2021 -0700

    HBASE-25816: Improve the documentation of Architecture section of reference guide (#3211)
    
    Signed-off-by: Sakthi <sa...@apache.org>
---
 src/main/asciidoc/_chapters/architecture.adoc | 28 +++++++++++++--------------
 src/main/asciidoc/_chapters/hbase_mob.adoc    | 10 +++++-----
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc
index 0b12d29..5e27459 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -293,7 +293,7 @@ HMasters instead of ZooKeeper ensemble`
 
 To reduce hot-spotting on a single master, all the masters (active & stand-by) expose the needed
 service to fetch the connection metadata. This lets the client connect to any master (not just active).
-Both ZooKeeper- and Master-based connection registry implementations are available in 2.3+. For
+Both ZooKeeper-based and Master-based connection registry implementations are available in 2.3+. For
 2.3 and earlier, the ZooKeeper-based implementation remains the default configuration.
 The Master-based implementation becomes the default in 3.0.0.
 
@@ -437,7 +437,7 @@ ValueFilter vf = new ValueFilter(CompareOperator.EQUAL,
 scan.setFilter(vf);
 ...
 ----
-This scan will restrict to the specified column 'family:qualifier', avoiding scan unrelated
+This scan will restrict to the specified column 'family:qualifier', avoiding scan of unrelated
 families and columns, which has better performance, and `ValueFilter` is the condition used to do
 the value filtering.
 
@@ -664,7 +664,7 @@ If the active Master loses its lease in ZooKeeper (or the Master shuts down), th
 [[master.runtime]]
 === Runtime Impact
 
-A common dist-list question involves what happens to an HBase cluster when the Master goes down. This information has changed staring 3.0.0.
+A common dist-list question involves what happens to an HBase cluster when the Master goes down. This information has changed starting 3.0.0.
 
 ==== Up until releases 2.x.y
 Because the HBase client talks directly to the RegionServers, the cluster can still function in a "steady state". Additionally, per <<arch.catalog>>, `hbase:meta` exists as an HBase table and is not resident in the Master.
@@ -719,7 +719,7 @@ _MasterProcWAL is replaced in hbase-2.3.0 by an alternate Procedure Store implem
 HMaster records administrative operations and their running states, such as the handling of a crashed server,
 table creation, and other DDLs, into a Procedure Store. The Procedure Store WALs are stored under the
 MasterProcWALs directory. The Master WALs are not like RegionServer WALs. Keeping up the Master WAL allows
-us run a state machine that is resilient across Master failures. For example, if a HMaster was in the
+us to run a state machine that is resilient across Master failures. For example, if a HMaster was in the
 middle of creating a table encounters an issue and fails, the next active HMaster can take up where
 the previous left off and carry the operation to completion. Since hbase-2.0.0, a
 new AssignmentManager (A.K.A AMv2) was introduced and the HMaster handles region assignment
@@ -920,7 +920,7 @@ The reason it is included in this equation is that it would be unrealistic to sa
 Here are some examples:
 
 * One region server with the heap size set to 1 GB and the default block cache size will have 405 MB of block cache available.
-* 20 region servers with the heap size set to 8 GB and a default block cache size will have 63.3 of block cache.
+* 20 region servers with the heap size set to 8 GB and a default block cache size will have 63.3 GB of block cache.
 * 100 region servers with the heap size set to 24 GB and a block cache size of 0.5 will have about 1.16 TB of block cache.
 
 Your data is not the only resident of the block cache.
@@ -933,7 +933,7 @@ NOTE: The hbase:meta tables can occupy a few MBs depending on the number of regi
 
 HFiles Indexes::
   An _HFile_ is the file format that HBase uses to store data in HDFS.
-  It contains a multi-layered index which allows HBase to seek to the data without having to read the whole file.
+  It contains a multi-layered index which allows HBase to seek the data without having to read the whole file.
   The size of those indexes is a factor of the block size (64KB by default), the size of your keys and the amount of data you are storing.
   For big data sets it's not unusual to see numbers around 1GB per region server, although not all of it will be in cache because the LRU will evict indexes that aren't used.
 
@@ -974,7 +974,7 @@ Since link:https://issues.apache.org/jira/browse/HBASE-4683[HBASE-4683 Always ca
 [[enable.bucketcache]]
 ===== How to Enable BucketCache
 
-The usual deploy of BucketCache is via a managing class that sets up two caching tiers:
+The usual deployment of BucketCache is via a managing class that sets up two caching tiers:
 an on-heap cache implemented by LruBlockCache and a second  cache implemented with BucketCache.
 The managing class is link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html[CombinedBlockCache] by default.
 The previous link describes the caching 'policy' implemented by CombinedBlockCache.
@@ -1005,7 +1005,7 @@ HBASE-11425 changed the HBase read path so it could hold the read-data off-heap
 See <<regionserver.offheap.readpath>>. In hbase-2.0.0, off-heap latencies approach those of on-heap cache latencies with the added
 benefit of NOT provoking GC.
 +
-From HBase 2.0.0 onwards, the notions of L1 and L2 have been deprecated. When BucketCache is turned on, the DATA blocks will always go to BucketCache and INDEX/BLOOM blocks go to on heap LRUBlockCache. `cacheDataInL1` support hase been removed.
+From HBase 2.0.0 onwards, the notions of L1 and L2 have been deprecated. When BucketCache is turned on, the DATA blocks will always go to BucketCache and INDEX/BLOOM blocks go to on heap LRUBlockCache. `cacheDataInL1` support has been removed.
 ====
 
 [[bc.deloy.modes]]
@@ -1013,7 +1013,7 @@ From HBase 2.0.0 onwards, the notions of L1 and L2 have been deprecated. When Bu
 The BucketCache Block Cache can be deployed _offheap_, _file_ or _mmaped_ file mode.
 
 You set which via the `hbase.bucketcache.ioengine` setting.
-Setting it to `offheap` will have BucketCache make its allocations off-heap, and an ioengine setting of `file:PATH_TO_FILE` will direct BucketCache to use file caching (Useful in particular if you have some fast I/O attached to the box such as SSDs). From 2.0.0, it is possible to have more than one file backing the BucketCache. This is very useful specially when the Cache size requirement is high. For multiple backing files, configure ioengine as `files:PATH_TO_FILE1,PATH_TO_FILE2,PATH_T [...]
+Setting it to `offheap` will have BucketCache make its allocations off-heap, and an ioengine setting of `file:PATH_TO_FILE` will direct BucketCache to use file caching (Useful in particular if you have some fast I/O attached to the box such as SSDs). From 2.0.0, it is possible to have more than one file backing the BucketCache. This is very useful especially when the Cache size requirement is high. For multiple backing files, configure ioengine as `files:PATH_TO_FILE1,PATH_TO_FILE2,PATH_ [...]
 
 It is possible to deploy a tiered setup where we bypass the CombinedBlockCache policy and have BucketCache working as a strict L2 cache to the L1 LruBlockCache.
 For such a setup, set `hbase.bucketcache.combinedcache.enabled` to `false`.
@@ -1133,7 +1133,7 @@ As write requests are handled by the region server, they accumulate in an in-mem
 
 Logically, the process of splitting a region is simple. We find a suitable point in the keyspace of the region where we should divide the region in half, then split the region's data into two new regions at that point. The details of the process however are not simple.  When a split happens, the newly created _daughter regions_ do not rewrite all the data into new files immediately. Instead, they create small files similar to symbolic link files, named link:https://hbase.apache.org/devap [...]
 
-Although splitting the region is a local decision made by the RegionServer, the split process itself must coordinate with many actors. The RegionServer notifies the Master before and after the split, updates the `.META.` table so that clients can discover the new daughter regions, and rearranges the directory structure and data files in HDFS. Splitting is a multi-task process. To enable rollback in case of an error, the RegionServer keeps an in-memory journal about the execution state. T [...]
+Although splitting the region is a local decision made by the RegionServer, the split process itself must coordinate with many actors. The RegionServer notifies the Master before and after the split, updates the `.META.` table so that clients can discover the new daughter regions, and rearranges the directory structure and data files in HDFS. Splitting is a multi-task process. To enable rollback in case of an error, the RegionServer keeps an in-memory journal about the execution state. T [...]
 
 [[regionserver_split_process_image]]
 .RegionServer Split Process
@@ -1188,14 +1188,14 @@ link:http://en.wikipedia.org/wiki/Write-ahead_logging[Write-Ahead Log] article.
 
 [[wal.providers]]
 ==== WAL Providers
-In HBase, there are a number of WAL imlementations (or 'Providers'). Each is known
+In HBase, there are a number of WAL implementations (or 'Providers'). Each is known
 by a short name label (that unfortunately is not always descriptive). You set the provider in
-_hbase-site.xml_ passing the WAL provder short-name as the value on the
+_hbase-site.xml_ passing the WAL provider short-name as the value on the
 _hbase.wal.provider_ property (Set the provider for _hbase:meta_ using the
 _hbase.wal.meta_provider_ property, otherwise it uses the same provider configured
 by _hbase.wal.provider_).
 
- * _asyncfs_: The *default*. New since hbase-2.0.0 (HBASE-15536, HBASE-14790). This _AsyncFSWAL_ provider, as it identifies itself in RegionServer logs, is built on a new non-blocking dfsclient implementation. It is currently resident in the hbase codebase but intent is to move it back up into HDFS itself. WALs edits are written concurrently ("fan-out") style to each of the WAL-block replicas on each DataNode rather than in a chained pipeline as the default client does. Latencies should  [...]
+ * _asyncfs_: The *default*. New since hbase-2.0.0 (HBASE-15536, HBASE-14790). This _AsyncFSWAL_ provider, as it identifies itself in RegionServer logs, is built on a new non-blocking dfsclient implementation. It is currently resident in the hbase codebase but intent is to move it back up into HDFS itself. WALs edits are written concurrently ("fan-out") style to each of the WAL-block replicas on each DataNode rather than in a chained pipeline as the default client does. Latencies should  [...]
  * _filesystem_: This was the default in hbase-1.x releases. It is built on the blocking _DFSClient_ and writes to replicas in classic _DFSCLient_ pipeline mode. In logs it identifies as _FSHLog_ or _FSHLogProvider_.
  * _multiwal_: This provider is made of multiple instances of _asyncfs_ or  _filesystem_. See the next section for more on _multiwal_.
 
@@ -1371,7 +1371,7 @@ The default value for this property is `false`.
 By default, WAL tag compression is turned on when WAL compression is enabled.
 You can turn off WAL tag compression by setting the `hbase.regionserver.wal.tags.enablecompression` property to 'false'.
 
-A possible downside to WAL compression is that we lose more data from the last block in the WAL if it ill-terminated
+A possible downside to WAL compression is that we lose more data from the last block in the WAL if it is ill-terminated
 mid-write. If entries in this last block were added with new dictionary entries but we failed persist the amended
 dictionary because of an abrupt termination, a read of this last block may not be able to resolve last-written entries.
 
diff --git a/src/main/asciidoc/_chapters/hbase_mob.adoc b/src/main/asciidoc/_chapters/hbase_mob.adoc
index 9b67c6e..0e09db1 100644
--- a/src/main/asciidoc/_chapters/hbase_mob.adoc
+++ b/src/main/asciidoc/_chapters/hbase_mob.adoc
@@ -179,10 +179,10 @@ space. The only way to stop using the space of a particular MOB hfile is to ensu
 hold references to it. To do that we need to ensure we have written the current values into a new
 MOB hfile. If our backing filesystem has a limitation on the number of files that can be present, as
 HDFS does, then even if we do not have deletes or updates of MOB cells eventually there will be a
-sufficient number of MOB hfiles that we will need to coallesce them.
+sufficient number of MOB hfiles that we will need to coalesce them.
 
 Periodically a chore in the master coordinates having the region servers
-perform a special major compaction that also handles rewritting new MOB files. Like all compactions
+perform a special major compaction that also handles rewriting new MOB files. Like all compactions
 the Region Server will create updated hfiles that hold both the cells that are smaller than the MOB
 threshold and cells that hold references to the newly rewritten MOB file. Because this rewriting has
 the advantage of looking across all active cells for the region our several small MOB files should
@@ -237,7 +237,7 @@ To determine if a MOB HFile meets the second criteria the chore extracts metadat
 HFiles for each MOB enabled column family for a given table. That metadata enumerates the complete
 set of MOB HFiles needed to satisfy the references stored in the normal HFile area.
 
-The period of the cleaner chore can be configued by setting `hbase.master.mob.cleaner.period` to a
+The period of the cleaner chore can be configured by setting `hbase.master.mob.cleaner.period` to a
 positive integer number of seconds. It defaults to running daily. You should not need to tune it
 unless you have a very aggressive TTL or a very high rate of MOB updates with a correspondingly
 high rate of non-MOB compactions.
@@ -247,7 +247,7 @@ high rate of non-MOB compactions.
 ==== Further limiting write amplification
 
 If your MOB workload has few to no updates or deletes then you can opt-in to MOB compactions that
-optimize for limiting the amount of write amplification. It acheives this by setting a
+optimize for limiting the amount of write amplification. It achieves this by setting a
 size threshold to ignore MOB files during the compaction process. When a given region goes
 through MOB compaction it will evaluate the size of the MOB file that currently holds the actual
 value and skip rewriting the value if that file is over threshold.
@@ -629,7 +629,7 @@ HBase upgrades.
 
 Prior to the work in HBASE-22749, "Distributed MOB compactions", HBase had the Master coordinate all
 compaction maintenance of the MOB hfiles. Centralizing management of the MOB data allowed for space
-optimizations but safely coordinating that managemet with Region Servers resulted in edge cases that
+optimizations but safely coordinating that management with Region Servers resulted in edge cases that
 caused data loss (ref link:https://issues.apache.org/jira/browse/HBASE-22075[HBASE-22075]).
 
 Users of the MOB feature upgrading to a version of HBase that includes HBASE-22749 should be aware