You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by el...@apache.org on 2015/05/19 18:44:50 UTC

svn commit: r1680341 - /accumulo/site/trunk/content/release_notes/1.7.0.mdtext

Author: elserj
Date: Tue May 19 16:44:50 2015
New Revision: 1680341

URL: http://svn.apache.org/r1680341
Log:
ACCUMULO-3737 Performance improvements

Modified:
    accumulo/site/trunk/content/release_notes/1.7.0.mdtext

Modified: accumulo/site/trunk/content/release_notes/1.7.0.mdtext
URL: http://svn.apache.org/viewvc/accumulo/site/trunk/content/release_notes/1.7.0.mdtext?rev=1680341&r1=1680340&r2=1680341&view=diff
==============================================================================
--- accumulo/site/trunk/content/release_notes/1.7.0.mdtext (original)
+++ accumulo/site/trunk/content/release_notes/1.7.0.mdtext Tue May 19 16:44:50 2015
@@ -16,11 +16,19 @@ Notice:    Licensed to the Apache Softwa
            specific language governing permissions and limitations
            under the License.
 
-Apache Accumulo 1.7.0 is a release that needs to be described
-
 #DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT
 
-# Notable Improvements
+Apache Accumulo 1.7.0 is a major release which includes a number of important milestone features
+that expand on the functionality of Accumulo. These features range from security to availability
+to extendability.
+
+In the context of Accumulo's Semantic Versioning guidelines, this is a "minor version" which means
+that new APIs have been created, but no deprecated APIs have been removed. Code written against
+1.6.x should work against 1.7.0, possibly with a re-compilation. As always, the Accumulo
+developers take API compatibility very seriously and have invested much time in ensuring that
+we meet the promises set forward to our users.
+
+## Major Changes
 
 ### Client Authentication with Kerberos
 
@@ -130,50 +138,15 @@ Created an Accumulo API regular expressi
 this checkstyle rule to ensure they are only using Accumulo's public API. The regular expression can be found in the 
 [README][readme].
 
+## Updated Minimum Versions
 
-## Notable Bug Fixes
-
-### SourceSwitchingIterator Deadlock
-
-An instance of SourceSwitchingIterator, the Accumulo iterator which transparently
-manages whether data for a Tablet is in memory (the in-memory map) or disk (HDFS 
-after a minor compaction), was found deadlocked in a production system.
-
-This deadlock prevented the scan and the minor compaction from ever successfully
-completing without restarting the TabletServer. [ACCUMULO-3745][ACCUMULO-3745]
-fixes the inconsistent synchronization inside of the SourceSwitchingIterator
-to prevent this deadlock from happening in the future.
-
-
-### Table flush blocked indefinitely
-
-While running the Accumulo Randomwalk distributed test, it was observed
-that all activity in Accumulo had stopped and there was an offline
-Accumulo metadata table tablet. The system first tried to flush a user
-tablet but the metadata table was not online (likely due to the agitation
-process which stops and starts Accumulo processes during the test). After
-this call, a call to load the metadata tablet was queued but could not 
-complete until the previous flush call. Thus, a deadlock occurred.
-
-This deadlock happened because the synchronous flush call could not complete
-before the load tablet call completed, but the load tablet call couldn't
-run because of connection caching we perform in Accumulo's RPC layer
-to reduce the quantity of sockets we need to create to send data. 
-[ACCUMULO-3597][ACCUMULO-3597] prevents this dealock by forcing a
-non-cached connection for the message requesting loads of metadata tablets,
-we can ensure that this deadlock won't occur.
-
-
-## Performance Improvements
-
-### Performance Improvement 1
+Apache Accumulo 1.7.0 comes with an updated set of minimum dependencies.
 
-Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
-Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure
- dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
- non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
+  * Java7 is required. Java6 support is dropped.
+  * Hadoop 1 support is dropped, at least Hadoop 2.2.0 is required
+  * ZooKeeper 3.4.x or greater is required.
 
-## General improvements
+## Other improvements
 
 ### Balancing Groups of Tablets
 
@@ -226,15 +199,74 @@ via its own "Cloudtrace" library, but wa
 has the benefit of timings (spans) already in Accumulo automatically containing
 additional information from the HDFS operations.
 
+## Performance Improvements
+
+### Configurable Threadpool Size for Assignments
+
+One of the primary tasks that the Accumulo Master is responsible for is the
+assignment of Tablets to TabletServers. Before a TabletServer can be brought online,
+the tablet must not have any outstanding logs as this represents a need to perform
+recovery (the tablet was not unloaded cleanly). This process can take some time for
+large write-ahead log files and is performed on a TabletServer to keep the Master
+light and agile.
+
+Assignments, whether the Tablets need to perform recovery or not, share the same
+threadpool in the Master. This means that when a large number of TabletServers are
+available, too few threads dedicated to assignment can restrict the speed at which
+assignments can be performed. [ACCUMULO-1085][ACCUMULO-1085] allows the size of the
+threadpool used in the Master for assignments to be configurable which can be
+dynamically altered to remove the artificial limitation when sufficient servers are available.
+
+### Group-Commit Threshold as a Factor of Data Size
+
+When ingesting data into Accumulo, the majority of time is spent in the write-ahead
+log. As such, this is a common place that optimizations are added. One optimization
+is the notion of "group-commit". When multiple clients are writing data to the same
+Accumulo Tablet, it is not efficient for each of them to synchronize the WAL, flush their
+updates to disk for durability, and then release the lock. The idea of group-commit
+is that multiple writers can queue their write their mutations to the WAL and perform
+then wait for a sync that could satisfy the durability constraints of multiple clients
+instead of just one. This has a drastic improvement on performance.
+
+In previous versions, Accumulo controlled the frequency in which this group-commit
+sync was performed as a factor of clients writing to Accumulo. This was both confusing
+to correctly configure and also encouraged sub-par performance with fewer writers.
+[ACCUMULO-1950][ACCUMULO-1950] introduced a new configuration property `tserver.total.mutation.queue.max`
+which defines the amount of data that is queued before a group-commit is performed
+in such a way that is agnostic of the number of writers. This new configuration property
+is much easier to reason about than the previous, now deprecated, `tserver.mutation.queue.max`.
+
+## Notable Bug Fixes
+
+### SourceSwitchingIterator Deadlock
+
+An instance of SourceSwitchingIterator, the Accumulo iterator which transparently
+manages whether data for a Tablet is in memory (the in-memory map) or disk (HDFS 
+after a minor compaction), was found deadlocked in a production system.
+
+This deadlock prevented the scan and the minor compaction from ever successfully
+completing without restarting the TabletServer. [ACCUMULO-3745][ACCUMULO-3745]
+fixes the inconsistent synchronization inside of the SourceSwitchingIterator
+to prevent this deadlock from happening in the future.
 
 
-## Documentation
+### Table flush blocked indefinitely
 
-Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
-Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure
- dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
- non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
+While running the Accumulo Randomwalk distributed test, it was observed
+that all activity in Accumulo had stopped and there was an offline
+Accumulo metadata table tablet. The system first tried to flush a user
+tablet but the metadata table was not online (likely due to the agitation
+process which stops and starts Accumulo processes during the test). After
+this call, a call to load the metadata tablet was queued but could not 
+complete until the previous flush call. Thus, a deadlock occurred.
 
+This deadlock happened because the synchronous flush call could not complete
+before the load tablet call completed, but the load tablet call couldn't
+run because of connection caching we perform in Accumulo's RPC layer
+to reduce the quantity of sockets we need to create to send data. 
+[ACCUMULO-3597][ACCUMULO-3597] prevents this dealock by forcing a
+non-cached connection for the message requesting loads of metadata tablets,
+we can ensure that this deadlock won't occur.
 
 ## Testing
 
@@ -305,8 +337,10 @@ and, in HDFS High-Availability instances
 [17]: http://en.wikipedia.org/wiki/POODLE
 [18]: https://issues.apache.org/jira/browse/ACCUMULO-3316
 
+[ACCUMULO-1085]: https://issues.apache.org/jira/browse/ACCUMULO-1085
 [ACCUMULO-1798]: https://issues.apache.org/jira/browse/ACCUMULO-1798
 [ACCUMULO-1817]: https://issues.apache.org/jira/browse/ACCUMULO-1817
+[ACCUMULO-1950]: https://issues.apache.org/jira/browse/ACCUMULO-1950
 [ACCUMULO-1957]: https://issues.apache.org/jira/browse/ACCUMULO-1957
 [ACCUMULO-2815]: https://issues.apache.org/jira/browse/ACCUMULO-2815
 [ACCUMULO-2998]: https://issues.apache.org/jira/browse/ACCUMULO-2998