You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by bu...@apache.org on 2015/05/20 00:32:27 UTC
svn commit: r951945 - in /websites/staging/accumulo/trunk/content: ./ release_notes/1.7.0.html

Author: buildbot
Date: Tue May 19 22:32:26 2015
New Revision: 951945

Log:
Staging update by buildbot for accumulo

Modified:
    websites/staging/accumulo/trunk/content/   (props changed)
    websites/staging/accumulo/trunk/content/release_notes/1.7.0.html

Propchange: websites/staging/accumulo/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue May 19 22:32:26 2015
@@ -1 +1 @@
-1680406
+1680412

Modified: websites/staging/accumulo/trunk/content/release_notes/1.7.0.html
==============================================================================
--- websites/staging/accumulo/trunk/content/release_notes/1.7.0.html (original)
+++ websites/staging/accumulo/trunk/content/release_notes/1.7.0.html Tue May 19 22:32:26 2015
@@ -213,13 +213,14 @@ Latest 1.5 release: <strong>1.5.2</stron
     <h1 class="title">Apache Accumulo 1.7.0 Release Notes</h1>
 
     <p>Apache Accumulo 1.7.0 is a major release which includes a number of important milestone features
-that expand on the functionality of Accumulo. These features range from security to availability
+that expand on the core functionality of Accumulo. These features range from security to availability
 to extendability. Nearly 700 JIRA issues were resolved with the release of this version: approximately
 two-thirds of which were bugs and one third were improvements.</p>
 <p>In the context of Accumulo's <a href="http://semver.org">Semantic Versioning</a> <a href="https://github.com/apache/accumulo/blob/1.7.0/README.md#api">guidelines</a>, this is a "minor version"
-which means that new APIs have been created, but no deprecated APIs have been removed. Code written against
-1.6.x should work against 1.7.0, possibly with a re-compilation. As always, the Accumulo
-developers take API compatibility very seriously and have invested much time in ensuring that
+which means that new APIs have been created, some deprecations may have been added, but no deprecated APIs
+have been removed. Code written against
+1.6.x should work against 1.7.0, likely binary-compatible but definitely source-compatible. As always, the Accumulo
+developers take API compatibility very seriously and have invested significant time and effort in ensuring that
 we meet the promises set forward to our users.</p>
 <h2 id="major-changes">Major Changes</h2>
 <h3 id="client-authentication-with-kerberos">Client Authentication with Kerberos</h3>
@@ -228,39 +229,41 @@ and other related components. Kerberos r
 to authentication users who have credentials provided by an administrator. When Hadoop is
 configured for use with Kerberos, all users must provide Kerberos credentials to interact
 with the filesystem, launch YARN jobs, or even view certain web pages.</p>
-<p>While Accumulo has long supported operation on Hadoop with Kerberos enabled, it required
-Accumulo users to still use password-based authentication. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2815">ACCUMULO-2815</a>
-added support allowing Accumulo clients to use a single set of Kerberos
-credentials to interact with Accumulo and all other Hadoop components.</p>
+<p>While Accumulo has long supported operation on Kerberos-enabled HDFS, it still required
+Accumulo users to use password-based authentication. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2815">ACCUMULO-2815</a>
+added support that allows Accumulo clients to use their existing Kerberos
+credentials to interact with Accumulo and all other Hadoop components instead of 
+a separate username and password for Accumulo.</p>
 <p>This authentication leverages the <a href="http://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer">Simple Authentication and Security Layer (SASL)</a>
-and <a href="http://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface">GSSAPI</a> to support Kerberos authentication over the existing Thrift-base
-RPC infrastructure that Accumulo leverages.</p>
-<p>These additions represent a significant forward step for Accumulo, bringing it up to
-speed with the rest of the Hadoop components. This results in a much more cohesive
-security story for Accumulo that resonates with the battle-tested cell-level security
-and authorization module.</p>
+and <a href="http://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface">GSSAPI</a> interface to support Kerberos authentication over the existing Thrift-based
+RPC infrastructure that Accumulo uses.</p>
+<p>These additions represent a significant forward step for Accumulo, bringing its client-authentication
+up to speed with the rest of the Hadoop ecosystem. This results in a much more cohesive
+authetication story for Accumulo that resonates with the battle-tested cell-level security
+and authorization components Accumulo users are very familiar with already.</p>
 <p>More information on configuration, administration and application of Kerberos client
 authentication can be found in the <a href="http://accumulo.staging.apache.org/1.7/accumulo_user_manual.html#_kerberos">Kerberos chapter</a> of the Accumulo
 User Manual.</p>
 <h3 id="data-center-replication">Data-Center Replication</h3>
-<p>In previous releases, Accumulo only operated withing the constraints of a single installation.
+<p>In previous releases, Accumulo only operated within the constraints of a single installation.
 Because single instances of Accumulo often consist of many nodes and Accumulo's design scales
-efficiently across many nodes, it is typical that one Accumulo is run per physical installation
+(near) linearly across many nodes, it is typical that one Accumulo is run per physical installation
 or data-center. <a href="https://issues.apache.org/jira/browse/ACCUMULO-378">ACCUMULO-378</a> introduces support in Accumulo to automatically
 copy data from one Accumulo instance to another.</p>
-<p>This data-center replication feature is directly applicable for users wishing to implement
-a disaster recovery strategy. Data can be automatically sent from a primary instance to one
-or more other Accumulo instances. Conversely to normal Accumulo operation in which ingest and
-query are strongly consistent, replication is a lazy, eventually consistent operation. This
+<p>This data-center replication feature is primarily applicable to users wishing to implement
+a disaster recovery strategy. Data can be automatically copied from a primary instance to one
+or more secondary Accumulo instances. Where normal Accumulo ingest and
+queries are strongly consistent, data-center replication is a lazy, eventually consistent operation. This
 is desirable for replication as it prevents additional latency for ingest operations on the
-primary instance. Additionally, outages between the primary instance and replicas can sustain
+primary instance. Additionally, network outages between the primary instance and replicas can sustain
 prolonged outages without any administrative overhead.</p>
 <p>The Accumulo User Manual contains a <a href="http://accumulo.staging.apache.org/1.7/accumulo_user_manual.html#_replication">new chapter on replication</a> which covers
-in better detail the design and implementation of the feature, how users can configure replication
-and special cases to consider when choosing to integrate the feature into a user application.</p>
+in great detail the design and implementation of the feature, how users can configure replication
+and special cases to consider when choosing to integrate the feature into user applications.</p>
 <h3 id="user-initiated-compaction-strategies">User Initiated Compaction Strategies</h3>
-<p>Per table compaction strategies were added in 1.6.0.  In 1.7.0 the ability to
-specify a compaction strategy for a user initiated compaction was added in
+<p>Per table compaction strategies were added in 1.6.0 to provide custom logic in choosing which
+files are chosen for a major compaction.  In 1.7.0, the ability to
+specify a compaction strategy for a user-initiated compaction was added in
 <a href="https://issues.apache.org/jira/browse/ACCUMULO-1798">ACCUMULO-1798</a>.   This allows surgical compactions on a subset 
 of tablets files.  Previously a user initiated compaction would compact all 
 files in a tablet.</p>
@@ -297,19 +300,19 @@ were also added to the shell to specify
 specify an arbitraty compaction strategy is mutually exclusive with the file selection 
 options and file creation options.</p>
 <h3 id="api-clarification">API Clarification</h3>
-<p>The declared API in 1.6.x was incomplete. Some important classes like ColumnVisibility were not declared as Accumulo API. A lot 
-of work was done under <a href="https://issues.apache.org/jira/browse/ACCUMULO-3657">ACCUMULO-3657</a> to correct the API statement and clean up the API. The expanded and 
-simplified API statement is in the <a href="https://github.com/apache/accumulo/blob/1.7.0/README.md">README</a>.</p>
-<p>In some places in the API, non API types were used. Ideally public API members would only use public API types. A tool called 
+<p>The declared API in 1.6.x was incomplete. Some important classes like ColumnVisibility were not declared as Accumulo API. Significant 
+work was done under <a href="https://issues.apache.org/jira/browse/ACCUMULO-3657">ACCUMULO-3657</a> to correct the API statement and clean up the API to be representative of
+all classes which users are intended to interact with. The expanded and simplified API statement is in the <a href="https://github.com/apache/accumulo/blob/1.7.0/README.md#api">README</a>.</p>
+<p>In some places in the API, non-API types were used. Ideally, public API members would only use public API types. A tool called 
 <a href="http://code.revelc.net/apilyzer-maven-plugin/">APILyzer</a> was created to find all API members that used non-API types. Many of the violations found by this tool were 
-deprecated to clearly communicate that a non API type was used. For example, a few public API methods returned a class called 
-KeyExtent. KeyExtent was never intended to be in the public API, it contains a lot of code related to Accumulo internals. KeyExtent 
-and the API methods returning it were deprecated. These were replaced with a new way of identifying tablets that does not expose 
-internals. Deprecating a type like this from the API makes the API more stable and makes it easier for contributors to change 
+deprecated to clearly communicate that a non API type was used. One example is a public API method that returned a class called 
+<code>KeyExtent</code>. <code>KeyExtent</code> was never intended to be in the public API because it contains code related to Accumulo internals. KeyExtent 
+and the API methods returning it have since been deprecated. These were replaced with a new class for identifying tablets that does not expose 
+the internals like <code>KeyExtent</code> did. Deprecating a type like this from the API makes the API more stable while also easier for contributors to change 
 Accumulo internals w/o impacting the API.</p>
-<p>Created an Accumulo API regular expression for use with checkstyle. Starting with 1.7.0, projects building on Accumulo can use 
-this checkstyle rule to ensure they are only using Accumulo's public API. The regular expression can be found in the 
-<a href="https://github.com/apache/accumulo/blob/1.7.0/README.md">README</a>.</p>
+<p>The changes in <a href="https://issues.apache.org/jira/browse/ACCUMULO-3657">ACCUMULO-3657</a> also included an Accumulo API regular expression for use with checkstyle. Starting
+with 1.7.0, projects building on Accumulo can use this checkstyle rule to ensure they are only using Accumulo's public API.
+The regular expression can be found in the <a href="https://github.com/apache/accumulo/blob/1.7.0/README.md#api">README</a>.</p>
 <h3 id="updated-minimum-versions">Updated Minimum Versions</h3>
 <p>Apache Accumulo 1.7.0 comes with an updated set of minimum dependencies.</p>
 <ul>
@@ -319,43 +322,58 @@ this checkstyle rule to ensure they are
 </ul>
 <h2 id="other-improvements">Other improvements</h2>
 <h3 id="balancing-groups-of-tablets">Balancing Groups of Tablets</h3>
-<p>By default Accumulo evenly spreads each tables tablets across a cluster.  In some 
-situations its advantageous for query or ingest to evenly spreads groups of tablets 
-within a table.  For <a href="https://issues.apache.org/jira/browse/ACCUMULO-3439">ACCUMULO-3439</a> a new balancer was added to evenly 
-spread groups of tablets.  This Apache <a href="https://blogs.apache.org/accumulo/entry/balancing_groups_of_tablets">blog post</a> provides more 
-details.</p>
+<p>By default, Accumulo evenly spreads each tables tablets across a cluster.  In some 
+situations, it's advantageous for query or ingest to evenly spreads groups of tablets 
+within a table.  For <a href="https://issues.apache.org/jira/browse/ACCUMULO-3439">ACCUMULO-3439</a>, a new balancer was added to evenly 
+spread groups of tablets for the purposes of optimizing performance.  This
+<a href="https://blogs.apache.org/accumulo/entry/balancing_groups_of_tablets">blog post</a> provides more details about when and why users may desire
+to leverage this feature..</p>
 <h3 id="user-specified-durability">User-specified Durability</h3>
-<p>Prior to 1.7, a user could configure the durability for individual tables. With the implementation of
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-1957">ACCUMULO-1957</a>, the durability can be specified by the user when creating a BatchWriter. 
-This can result in substantially faster ingest rates when the durability can 
-be relaxed, such as when power is redundantly available to the cluster.</p>
+<p>Accumulo constantly tries to balance durability with performance. These are difficult problems
+because guaranteeing durability of every write to Accumulo is very difficult in a massively-concurrent
+environment that requires high throughput. One common area to focus this attention is the write-ahead log
+as it must eventually call <code>fsync</code> on the local to guarantee that data written to is durable in the face
+of unexpected power failures. In some cases where durability can be sacrificed, either due to the nature
+of the data itself or redundant power supplies, ingest performance improvements can be attained.</p>
+<p>Prior to 1.7, a user could configure the level of durability for individual tables. With the implementation of
+<a href="https://issues.apache.org/jira/browse/ACCUMULO-1957">ACCUMULO-1957</a>, durability is a first-class member on the <code>BatchWriter</code>. All <code>Mutations</code> written
+using that <code>BatchWriter</code> will be written with the provided durability. This can result in substantially faster
+ingest rates when the durability can be relaxed.</p>
 <h3 id="waitforbalance-api">waitForBalance API</h3>
-<p>When creating a new Accumulo table, the common next step is to add splits to the
+<p>When creating a new Accumulo table, the next step is typically adding splits to that
 table before starting ingest. This can be extremely important as a table without
-any splits will only be hosted on a single TabletServer. Adding many splits will
-ensure that a table is distributed across many servers.</p>
-<p>In previous versions, adding splits to a table is synchronous, but the assignment
-of those splits was asynchronous. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2998">ACCUMULO-2998</a> adds a new method
+any splits will only be hosted on a single TabletServer and create a ingest bottleneck
+until the table begins to naturally split. Adding many splits before ingesting will
+ensure that a table is distributed across many servers and result in high throughput
+when ingest first starts.</p>
+<p>Adding splits to a table has long been a synchronous operation, but the assignment
+of those splits was asynchronous. A large number of splits could be processed, but
+it was not guaranteed that they would be evenly distributed resulting in the same problem
+as having an insufficient number of splits. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2998">ACCUMULO-2998</a> adds a new method
 to <code>InstanceOperations</code> which allows users to wait for all tablets to be balanced.
 This method lets users wait until tablets are appropriately distributed so that
-ingest can be started unhampered.</p>
+ingest can be run at full-bore immediately.</p>
 <h3 id="hadoop-metrics2-support">Hadoop Metrics2 Support</h3>
-<p>Accumulo has long had its own metrics system integration using Java MBeans. This
+<p>Accumulo has long had its own metrics system implemented using Java MBeans. This
 enabled metrics to be reported by Accumulo services, but consumption by other systems
 often required use of an additional tool like jmxtrans to read the metrics from the
-MBean and send them to some other system.</p>
-<p><a href="https://issues.apache.org/jira/browse/ACCUMULO-1817">ACCUMULO-1817</a> switches Accumulo to replace the custom MBean code
+MBeans and send them to some other system.</p>
+<p><a href="https://issues.apache.org/jira/browse/ACCUMULO-1817">ACCUMULO-1817</a> replaces this custom metrics system Accumulo
 with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of which
 is invalidating the need for an additional process to send metrics to common metrics
 storage and visualization tools. With Metrics2 support, Accumulo can send its
 metrics to common tools like Ganglia and Graphite.</p>
+<p>For more information on enabling Hadoop Metrics2, see the <a href="http://accumulo.staging.apache.org/1.7/accumulo_user_manual.html#_metrics">Metrics Chapter</a>
+in the Accumulo User Manual.</p>
 <h3 id="distributed-tracing-with-htrace">Distributed Tracing with HTrace</h3>
 <p>HTrace has recently started gaining traction as a standlone-project, especially
 with its adoption in HDFS. Accumulo has long had distributed tracing support
-via its own "Cloudtrace" library, but wasn't intended to use outside of Accumulo.</p>
+via its own "Cloudtrace" library, but this wasn't intended for use outside of Accumulo.</p>
 <p><a href="https://issues.apache.org/jira/browse/ACCUMULO-898">ACCUMULO-898</a> replaces Accumulo's Cloudtrace code with HTrace. This
-has the benefit of timings (spans) already in Accumulo automatically containing
-additional information from the HDFS operations.</p>
+has the benefit of adding timings (spans) from HDFS into Accumulo spans automatically.</p>
+<p>Users who inspect traces via the Accumulo Monitor (or another system) will begin to
+see timings from HDFS during operations like Major and Minor compactions when running
+with at least Apache Hadoop 2.6.0.</p>
 <h2 id="performance-improvements">Performance Improvements</h2>
 <h3 id="configurable-threadpool-size-for-assignments">Configurable Threadpool Size for Assignments</h3>
 <p>One of the primary tasks that the Accumulo Master is responsible for is the