You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2021/02/02 15:17:07 UTC

[accumulo-website] branch main updated: Describes 2.1.0 compaction changes in rel notes (#261)

This is an automated email from the ASF dual-hosted git repository.

kturner pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/main by this push:
     new 6f1922b  Describes 2.1.0 compaction changes in rel notes (#261)
6f1922b is described below

commit 6f1922b00a398ccb2b134c7d359ecee0866d1164
Author: Keith Turner <kt...@apache.org>
AuthorDate: Tue Feb 2 10:17:00 2021 -0500

    Describes 2.1.0 compaction changes in rel notes (#261)
---
 _posts/release/2020-01-19-accumulo-2.1.0.md | 35 ++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/_posts/release/2020-01-19-accumulo-2.1.0.md b/_posts/release/2020-01-19-accumulo-2.1.0.md
index 3a587c6..460c339 100644
--- a/_posts/release/2020-01-19-accumulo-2.1.0.md
+++ b/_posts/release/2020-01-19-accumulo-2.1.0.md
@@ -7,12 +7,45 @@ draft: true
 
 ## Notable Changes
 
+### Compaction Changes
+
+Significant changes were made to how Accumulo compacts files in this release.  See 
+{% dlink administration/compaction %} for details, below are some highlights.
+
+ * Multiple concurrent compactions per tablet on disjoint files is now
+   supported.  Previously only a single compaction could run on a tablet.  This
+   allows tablets that are running long compactions on large files to
+   concurrently compact new smaller files that arrive.
+ * Multiple compaction thread pools per tablet server are now supported.
+   Previously only a single thread pool existed within a tablet server for
+   compactions.  With a single thread pool, if all threads are working on long
+   compactions it can starve quick compactions.  Now compactions with little
+   data can be processed by dedicated thread pools.
+ * Accumulo's default algorithm for selecting files to compact was modified to
+   select the smallest set of files that meet the compaction ratio criteria
+   instead of the largest set.  This change makes tablets more aggressive about
+   reducing their number files while still doing logarithmic compaction work.
+   This change also enables efficiently compacting new small files that arrive
+   during a long running compaction. 
+ * Having dedicated compaction threads pools for tables is now supported
+   through configuration.  The default configuration for Accumulo sets up
+   dedicated thread pools for compacting the Accumulo metadata table.
+ * Merging minor compactions were dropped.  These were added to Accumulo to
+   address the problem of new files arriving while a long running compaction
+   was running.  Merging minor compactions could cause O(N^2) compaction work.
+   The new compaction changes in this release can satisfy this use case while
+   doing a logarithmic amount of work.
+
+CompactionStrategy was deprecated in favor of new public APIs.
+See its [javadoc]({% jurl org.apache.accumulo.tserver.compaction.CompactionStrategy %}) 
+for more information.
+
 ### Fixed GC Metadata hotspots
 
 Prior to this release, Accumulo stored GC file candidates in the metadata table
 using rows of the form `~del<URI>`. This row schema lead to uneven load on
 the metadata table and metadata tablets that were eventually never used. In {%
-ghi 1043 %} the row fromat was changed to `~del<hash(URI)><URI>` resulting in
+ghi 1043 %} the row format was changed to `~del<hash(URI)><URI>` resulting in
 even load on the metadata table and even data spread in the tablets. After
 upgrading, there may still be splits in the metadata table using the old row
 format. These splits can be merged away as shown in the example below which