You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by mw...@apache.org on 2017/12/08 21:57:41 UTC

[accumulo-website] branch master updated: ACCUMULO-4752 Create documentation on improving performance (#46)

This is an automated email from the ASF dual-hosted git repository.

mwalch pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 0a525d5  ACCUMULO-4752 Create documentation on improving performance (#46)
0a525d5 is described below

commit 0a525d52cbd49c45124dd05feefb94828e2ec82f
Author: Mike Walch <mw...@apache.org>
AuthorDate: Fri Dec 8 16:57:39 2017 -0500

    ACCUMULO-4752 Create documentation on improving performance (#46)
    
    * Also, created documentation on RFile along with diagram
---
 _docs-2-0/getting-started/design.md      |  20 +++++++++---
 _docs-2-0/troubleshooting/performance.md |  52 +++++++++++++++++++++++++++++++
 images/docs/rfile_diagram.png            | Bin 0 -> 44053 bytes
 3 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/_docs-2-0/getting-started/design.md b/_docs-2-0/getting-started/design.md
index 01c015e..26e9048 100644
--- a/_docs-2-0/getting-started/design.md
+++ b/_docs-2-0/getting-started/design.md
@@ -107,10 +107,9 @@ ingest and query load is balanced across the cluster.
 When a write arrives at a TabletServer it is written to a Write-Ahead Log and
 then inserted into a sorted data structure in memory called a MemTable. When the
 MemTable reaches a certain size, the TabletServer writes out the sorted
-key-value pairs to a file in HDFS called a Relative Key File (RFile), which is a
-kind of Indexed Sequential Access Method (ISAM) file. This process is called a
-minor compaction. A new MemTable is then created and the fact of the compaction
-is recorded in the Write-Ahead Log.
+key-value pairs to a file in HDFS called an [RFile](#rfile)). This process is
+called a minor compaction. A new MemTable is then created and the fact of the
+compaction is recorded in the Write-Ahead Log.
 
 When a request to read data arrives at a TabletServer, the TabletServer does a
 binary search across the MemTable as well as the in-memory indexes associated
@@ -118,6 +117,18 @@ with each RFile to find the relevant values. If clients are performing a scan,
 several key-value pairs are returned to the client in order from the MemTable
 and the set of RFiles by performing a merge-sort as they are read.
 
+## RFile
+
+RFile (short for Relative Key File) is a file that contains Accumulo's sorted key-value
+pairs. The file is written to HDFS by Tablet Servers during a minor compaction. RFiles are
+organized using the Index Sequential Access Method (ISAM). RFiles consist of data (key/value) block,
+index blocks (which are used to find data block), and meta blocks (which contain
+metadata for bloom filters and summary statistics). Data in an RFile is seperated by
+locality group. The diagram below shows the logical view and HDFS file view of an RFile.
+
+![rfile diagram]({{ site.url }}/images/docs/rfile_diagram.png)
+<!-- Source at https://docs.google.com/presentation/d/1w9BgfgUtZ-3M14K-lIgv0UmvnOhVg10Zof6AUi-7pcc/edit?usp=sharing -->
+
 ## Compactions
 
 In order to manage the number of files per tablet, periodically the TabletServer
@@ -167,4 +178,3 @@ TabletServer failures are noted on the Master's monitor page, accessible via
 [clients]: {{page.docs_baseurl}}/getting-started/clients
 [merging]: {{page.docs_baseurl}}/getting-started/table_configuration#merging-tablets
 [compaction]: {{page.docs_baseurl}}/getting-started/table_configuration#compaction
-
diff --git a/_docs-2-0/troubleshooting/performance.md b/_docs-2-0/troubleshooting/performance.md
new file mode 100644
index 0000000..f6dd705
--- /dev/null
+++ b/_docs-2-0/troubleshooting/performance.md
@@ -0,0 +1,52 @@
+---
+title: Performance
+category: troubleshooting
+order: 5
+---
+
+Accumulo can be tuned to improve read and write performance.
+
+## Read performance
+
+1. Enable [caching] on tables to reduce reads to disk.
+
+1. Enable [bloom filters][bloom-filters] on tables to limit the number of disk lookups.
+
+1. Decrease the [major compaction ratio][compaction] of a table to decrease the number of
+   files per tablet. Less files reduces the latency of reads.
+
+1. Decrease the size of [data blocks in RFiles][rfile] by lowering [table.file.compress.blocksize] which can result
+   in better random seek performance. However, this can increase the size of indexes in the RFile. If the indexes
+   are too large to fit in cache, this can hinder performance. Also, as the index size increases the depth of the
+   index tree in each file may increase. Increasing [table.file.compress.blocksize.index] can reduce the depth of
+   the tree.
+
+## Write performance
+
+1. Enable [native maps][native-maps] on tablet servers to prevent Java garbage collection pauses
+   which can slow ingest.
+
+1. [Pre-split new tables][split] to distribute writes across multiple tablet servers.
+
+1. Ingest data using [multiple clients][multi-client] or [bulk ingest][bulk] to increase ingest throughput.
+
+1. Increase the [major compaction ratio][compaction] of a table to limit the number of major compactions
+   which improves ingest performance.
+
+1. On large Accumulo clusters, use [multiple HDFS volumes][multivolume] to increase write performance.
+
+1. Change the compression format used by [blocks in RFiles][rfile] by setting [table.file.compress.type] to
+   `snappy`. This increases write speed at the expense of using more disk space.
+
+[caching]: {{ page.docs_baseurl }}/administration/caching
+[bloom-filters]: {{ page.docs_baseurl }}/getting-started/table_configuration#bloom-filters
+[compaction]: {{ page.docs_baseurl }}/getting-started/table_configuration#compaction
+[rfile]: {{ page.docs_baseurl }}/getting-started/design#rfile
+[native-maps]: {{ page.docs_baseurl }}/administration/in-depth-install#native-map
+[split]: {{ page.docs_baseurl }}//getting-started/table_configuration#pre-splitting-tables
+[multi-client]: {{ page.docs_baseurl }}/development/high_speed_ingest#multiple-ingest-clients
+[bulk]: {{ page.docs_baseurl }}/development/high_speed_ingest#bulk-ingest
+[multivolume]: {{ page.docs_baseurl }}/administration/multivolume
+[table.file.compress.blocksize]: {{ page.docs_baseurl }}/administration/properties#table_file_compress_blocksize
+[table.file.compress.blocksize.index]: {{ page.docs_baseurl }}/administration/properties#table_file_compress_blocksize_index
+[table.file.compress.type]: {{ page.docs_baseurl }}/administration/properties#table_file_compress_type
diff --git a/images/docs/rfile_diagram.png b/images/docs/rfile_diagram.png
new file mode 100644
index 0000000..511d72c
Binary files /dev/null and b/images/docs/rfile_diagram.png differ

-- 
To stop receiving notification emails like this one, please contact
['"commits@accumulo.apache.org" <co...@accumulo.apache.org>'].