You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by bu...@apache.org on 2014/09/19 23:10:54 UTC

svn commit: r922883 - in /websites/staging/accumulo/trunk/content: ./ release_notes/1.5.2.html

Author: buildbot
Date: Fri Sep 19 21:10:53 2014
New Revision: 922883

Log:
Staging update by buildbot for accumulo

Modified:
    websites/staging/accumulo/trunk/content/   (props changed)
    websites/staging/accumulo/trunk/content/release_notes/1.5.2.html

Propchange: websites/staging/accumulo/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Sep 19 21:10:53 2014
@@ -1 +1 @@
-1626327
+1626335

Modified: websites/staging/accumulo/trunk/content/release_notes/1.5.2.html
==============================================================================
--- websites/staging/accumulo/trunk/content/release_notes/1.5.2.html (original)
+++ websites/staging/accumulo/trunk/content/release_notes/1.5.2.html Fri Sep 19 21:10:53 2014
@@ -204,16 +204,48 @@ to benefit from the improvements.</p>
 to the 1.5 line as development has already shifted towards the 1.6 line. For those
 who cannot or do not want to upgrade to 1.6, 1.5.2 is still an excellent choice
 over earlier versions in the 1.5 line.</p>
-<h2 id="notable-improvements">Notable Improvements</h2>
-<p>While new features are typically not added in a bug-fix release as 1.5.2, the
-community does create a variety of improvements that are API compatible. Contained
-here are some of the more notable improvements.</p>
-<h3 id="performance-improvements">Performance improvements</h3>
+<h2 id="performance-improvements">Performance Improvements</h2>
+<p>Apache Accumulo 1.5.2 includes a number of performance-related fixes over previous versions.</p>
+<h3 id="write-ahead-log-sync-performance">Write-Ahead Log sync performance</h3>
 <p>The Write-Ahead Log (WAL) files are used to ensure durability of updates made to Accumulo.
 A "sync" is called on the file in HDFS to make sure that the changes to the WAL are persisted
 to disk, which allows Accumulo to recover in the case of failure. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2766">ACCUMULO-2766</a> fixed
 an issue where an operation against a WAL would unnecessarily wait for multiple syncs, slowing
 down the ingest on the system.</p>
+<h3 id="minor-compactions-not-aggressive-enough">Minor-Compactions not aggressive enough</h3>
+<p>On a system with ample memory provided to Accumulo, long hold-times were observed which
+blocks the ingest of new updates. Trying to free more server-side memory by running minor
+compactions more frequently increased the overall throughput on the node. These changes
+were made in <a href="https://issues.apache.org/jira/browse/ACCUMULO-2905">ACCUMULO-2905</a>.</p>
+<h3 id="heapiterator-optimization">HeapIterator optimization</h3>
+<p>Iterators, a notable feature of Accumulo, are provided to users as a server-side programming
+construct, but are also used internally for numerous server operations. One of these system iterator 
+is the HeapIterator which implements a PriorityQueue of other Iterators. One way this iterator is
+used is to merge multiple files in HDFS to present a single, sorted stream of Key-Value pairs. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2827">ACCUMULO-2827</a>
+introduces a performance optimization to the HeapIterator which can improve the speed of the
+HeapIterator in common cases.</p>
+<h3 id="write-ahead-log-sync-implementation">Write-Ahead log sync implementation</h3>
+<p>In Hadoop-2, two implementation of "sync" are provider: hflush and hsync. Both of these
+methods provide a way to request that the datanodes write the data to the underlying
+medium and not just hold it in memory (the 'fsync' syscall). While both of these methods
+inform the Datanodes to sync the relevant block(s), hflush does not wait for acknowledgement
+from the Datanodes that the sync finished, where hsync does. To provide the most reliable system
+"out of the box", Accumulo defaults to hsync so that your data is as secure as possible in 
+a variety of situations (notably, unexpected power outages).</p>
+<p>The downside is that performance tends to suffer because waiting for a sync to disk is a very
+expensive operation. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2842">ACCUMULO-2842</a> introduces a new system property, tserver.wal.sync.method,
+that lets users to change the HDFS sync implementation from 'hsync' to 'hflush'. Using 'hflush' instead
+of 'hsync' should result in about a 30% increase in ingest performance.</p>
+<p>For users upgrading from Hadoop-1 or Hadoop-0.20 releases, "hflush" is the equivalent of how
+sync was implemented and should give equivalent performance.</p>
+<h3 id="server-side-mutation-queue-size">Server-side mutation queue size</h3>
+<p>When users desire writes to be as durable as possible, using 'hsync', the ingest performance
+of the system can be improved by increasing the tserver.mutation.queue.max property. The cost
+of this change is that it will cause TabletServers to use additional memory per writer. In 1.5.1,
+the value of this parameter defaulted to a conservative 256K, which resulted in sub-par ingest
+performance.</p>
+<p>1.5.2 and <a href="https://issues.apache.org/jira/browse/ACCUMULO-3018">ACCUMULO-3018</a> increases this buffer to 1M which has a noticeable impact on
+ingest performance with a minimal increase in TabletServer memory usage.</p>
 <h2 id="notable-bug-fixes">Notable Bug Fixes</h2>
 <h3 id="fixes-mapreduce-package-name-change">Fixes MapReduce package name change</h3>
 <p>1.5.1 inadvertently included a change to RangeInputSplit which created an incompatibility
@@ -240,6 +272,11 @@ never returns. Most of these are related
 <p>The Writable interface methods on the RangeInputSplit class accidentally omitted
 calls to serialize the IteratorSettings configured for the Job. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2962">ACCUMULO-2962</a>
 fixes the serialization and adds some additional tests.</p>
+<h3 id="constraint-violation-causes-hung-scans">Constraint violation causes hung scans</h3>
+<p>A failed bulk import transaction had the ability to create an infinitely retrying
+loop due to a constraint violation. This directly prevents scans from completing,
+but will also hang compactions. <a href="https://issues.apache.org/jira/browse/ACCUMULO-3096">ACCUMULO-3096</a> fixes the issue so that the
+constraint no longer hangs the entire system.</p>
 <h2 id="documentation">Documentation</h2>
 <p>The following documentation updates were made: </p>
 <ul>