You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by el...@apache.org on 2016/06/23 15:12:41 UTC

[1/2] accumulo git commit: Add a paragraph on the smaller keys for index blocks improvement

Repository: accumulo
Updated Branches:
  refs/heads/asf-site 45d81dfc4 -> 604500f88
  refs/heads/gh-pages 6538290f3 -> f30cc51b1


Add a paragraph on the smaller keys for index blocks improvement


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/f30cc51b
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/f30cc51b
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/f30cc51b

Branch: refs/heads/gh-pages
Commit: f30cc51b11c0c2c0289f19579ca78a98753e547f
Parents: 6538290
Author: Josh Elser <el...@apache.org>
Authored: Thu Jun 23 11:11:05 2016 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Jun 23 11:11:05 2016 -0400

----------------------------------------------------------------------
 release_notes/1.7.2.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/f30cc51b/release_notes/1.7.2.md
----------------------------------------------------------------------
diff --git a/release_notes/1.7.2.md b/release_notes/1.7.2.md
index eebdc70..dccb092 100644
--- a/release_notes/1.7.2.md
+++ b/release_notes/1.7.2.md
@@ -31,10 +31,14 @@ There was a bug ([ACCUMULO-4148][ACCUMULO-4148]) where multiple put calls with i
 
 An improvement was introduced to allow a max age before WAL files would be automatically rolled. Without a max age, they could stay open for writing indefinitely, blocking the Hadoop DataNode decomissioning process. For more information, see [ACCUMULO-4004][ACCUMULO-4004].
 
-### Remove copy of cached RFile index blocks
+### Remove unnecessary copy of cached RFile index blocks
 
 Accumulo maintains an cache for file blocks in-memory as a performance optimization. This can be done safely because Accumulo RFiles are immutable, thus their blocks are also immutable. There are two types of these blocks: index and data blocks. Index blocks refer to the b-tree style index inside of each Accumulo RFile, while data blocks contain the sorted Key-Value pairs. In previous versions, when Accumulo extracted an Index block from the in-memory cache, it would copy the data. [ACCUMULO-4164][ACCUMULO-4164] removes this unnecessary copy as the contents are immutable and can be passed by reference. Ensuring that the Index blocks are not copied when accessed from the cache is a big performance gain at the file-access level.
 
+### Analyze Key-length to avoid choosing large Keys for RFile Index blocks
+
+Accumulo's RFile index blocks are made up of a Key which exists in the file and points to that specific location in the corresponding RFile data block. Thus, the size of the RFile index blocks is largely dominated by the size of the Keys which are used by the index. [ACCUMULO-4314][ACCUMULO-4314] is an improvement which uses statistics on the length of the Keys in the Rfile to avoid choosing Keys for the index whose length is greater than three standard deviations for the RFile. By choosing smaller Keys for the index, Accumulo can access the RFile index faster and keep more Index blocks cached in memory. Initial tests showed that with this change, the RFile index size was nearly cut in half.
+
 ### Minor performance improvements.
 
 Tablet servers would previously always hsync at the start of a minor compaction, causing delays in the write pipeline. These additional syncs were determined to provide no additional durability guarantees and have been removed. See [ACCUMULO-4112][ACCUMULO-4112] for additional detail.
@@ -83,3 +87,4 @@ HDFS High-Availability instances, forcing NameNode failover.
 [ACCUMULO-4173]: https://issues.apache.org/jira/browse/ACCUMULO-4173
 [ACCUMULO-4151]: https://issues.apache.org/jira/browse/ACCUMULO-4151
 [ACCUMULO-4164]: https://issues.apache.org/jira/browse/ACCUMULO-4164
+[ACCUMULO-4314]: https://issues.apache.org/jira/browse/ACCUMULO-4314


[2/2] accumulo git commit: Jekyll build from gh-pages:f30cc51

Posted by el...@apache.org.
Jekyll build from gh-pages:f30cc51

Add a paragraph on the smaller keys for index blocks improvement


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/604500f8
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/604500f8
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/604500f8

Branch: refs/heads/asf-site
Commit: 604500f884382a51c34da5bf831c7594b7d84307
Parents: 45d81df
Author: Josh Elser <el...@apache.org>
Authored: Thu Jun 23 11:11:05 2016 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Thu Jun 23 11:11:40 2016 -0400

----------------------------------------------------------------------
 feed.xml                 | 4 ++--
 release_notes/1.7.2.html | 6 +++++-
 2 files changed, 7 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/604500f8/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index 61f86f0..9491b3d 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>https://accumulo.apache.org/</link>
     <atom:link href="https://accumulo.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Thu, 23 Jun 2016 11:00:50 -0400</pubDate>
-    <lastBuildDate>Thu, 23 Jun 2016 11:00:50 -0400</lastBuildDate>
+    <pubDate>Thu, 23 Jun 2016 11:11:36 -0400</pubDate>
+    <lastBuildDate>Thu, 23 Jun 2016 11:11:36 -0400</lastBuildDate>
     <generator>Jekyll v3.0.5</generator>
     
   </channel>

http://git-wip-us.apache.org/repos/asf/accumulo/blob/604500f8/release_notes/1.7.2.html
----------------------------------------------------------------------
diff --git a/release_notes/1.7.2.html b/release_notes/1.7.2.html
index c4d8fdc..168579a 100644
--- a/release_notes/1.7.2.html
+++ b/release_notes/1.7.2.html
@@ -205,10 +205,14 @@ upgrade to 1.7 should consider 1.7.2 as a starting point.</p>
 
 <p>An improvement was introduced to allow a max age before WAL files would be automatically rolled. Without a max age, they could stay open for writing indefinitely, blocking the Hadoop DataNode decomissioning process. For more information, see <a href="https://issues.apache.org/jira/browse/ACCUMULO-4004">ACCUMULO-4004</a>.</p>
 
-<h3 id="remove-copy-of-cached-rfile-index-blocks">Remove copy of cached RFile index blocks</h3>
+<h3 id="remove-unnecessary-copy-of-cached-rfile-index-blocks">Remove unnecessary copy of cached RFile index blocks</h3>
 
 <p>Accumulo maintains an cache for file blocks in-memory as a performance optimization. This can be done safely because Accumulo RFiles are immutable, thus their blocks are also immutable. There are two types of these blocks: index and data blocks. Index blocks refer to the b-tree style index inside of each Accumulo RFile, while data blocks contain the sorted Key-Value pairs. In previous versions, when Accumulo extracted an Index block from the in-memory cache, it would copy the data. <a href="https://issues.apache.org/jira/browse/ACCUMULO-4164">ACCUMULO-4164</a> removes this unnecessary copy as the contents are immutable and can be passed by reference. Ensuring that the Index blocks are not copied when accessed from the cache is a big performance gain at the file-access level.</p>
 
+<h3 id="analyze-key-length-to-avoid-choosing-large-keys-for-rfile-index-blocks">Analyze Key-length to avoid choosing large Keys for RFile Index blocks</h3>
+
+<p>Accumulo\u2019s RFile index blocks are made up of a Key which exists in the file and points to that specific location in the corresponding RFile data block. Thus, the size of the RFile index blocks is largely dominated by the size of the Keys which are used by the index. <a href="https://issues.apache.org/jira/browse/ACCUMULO-4314">ACCUMULO-4314</a> is an improvement which uses statistics on the length of the Keys in the Rfile to avoid choosing Keys for the index whose length is greater than three standard deviations for the RFile. By choosing smaller Keys for the index, Accumulo can access the RFile index faster and keep more Index blocks cached in memory. Initial tests showed that with this change, the RFile index size was nearly cut in half.</p>
+
 <h3 id="minor-performance-improvements">Minor performance improvements.</h3>
 
 <p>Tablet servers would previously always hsync at the start of a minor compaction, causing delays in the write pipeline. These additional syncs were determined to provide no additional durability guarantees and have been removed. See <a href="https://issues.apache.org/jira/browse/ACCUMULO-4112">ACCUMULO-4112</a> for additional detail.</p>