You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2019/04/05 19:50:00 UTC

[accumulo-website] branch master updated: Reword and shorten 1.9.3 release notes

This is an automated email from the ASF dual-hosted git repository.

kturner pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/master by this push:
     new dd3c422  Reword and shorten 1.9.3 release notes
dd3c422 is described below

commit dd3c42260ce55a96dcb9b4edc910e26a19d3fc71
Author: Keith Turner <kt...@apache.org>
AuthorDate: Fri Apr 5 15:49:49 2019 -0400

    Reword and shorten 1.9.3 release notes
---
 _posts/release/2018-07-18-accumulo-1.9.3.md | 112 +++++++++++-----------------
 1 file changed, 44 insertions(+), 68 deletions(-)

diff --git a/_posts/release/2018-07-18-accumulo-1.9.3.md b/_posts/release/2018-07-18-accumulo-1.9.3.md
index 5f75777..57739bf 100644
--- a/_posts/release/2018-07-18-accumulo-1.9.3.md
+++ b/_posts/release/2018-07-18-accumulo-1.9.3.md
@@ -16,91 +16,68 @@ Users of any previous version of 1.8 or 1.9 are encouraged to upgrade
 
 ### Multiple Fixes for Write Ahead Logs
 
-This release fixes a number of issues with Write Ahead Logs that slow or prevent recovery 
-and in some cases lead to data loss. The fixes reduce the number of WALS that need to be 
-referenced by a tserver, improve error handing of WAL issues, and improve WAL clean up processes.
+This release fixes Write Ahead Logs issues that slow or prevent recovery
+and in some cases lead to data loss. The fixes reduce the number of WALS
+referenced by a tserver, improve error handing, and improve clean up.
 
-+ Eliminates a race condition that could result in data loss during recovery. 
-In cases where the GC deletes unreferenced WAL files as the master is simultaneously 
-attempting to create the list of WALs necessary for recovery, the master will skip 
-files that should be used in the recovery, resulting in data loss. Fixed in [#866]. 
++ Eliminates a race condition that could result in data loss during recovery.
+If the GC deletes unreferenced WALs from ZK while the master is reading
+recovery WALs from ZK, the master may skip WALs it should not, resulting in
+data loss.  Fixed in [#866].
 
-+ If the metadata table has references to a non-existent WAL file, operations 
-for that table will hang. This occurs when files published to zookeeper have existing 
-references but subsequent failures have occurred in the WAL files, or if a tserver 
-removes the file.  Reported in [#949] and fixed in [#1005] [#1057].
++ Opening a new WAL in DFS may fail, but still be advertised in ZK. This could
+result in a missing WAL during recovery, preventing tablets from loading.
+There is no data loss in this case, just WAL references that should not exists.
+Reported in [#949] and fixed in [#1005] [#1057].
 
-+ tserver failures could result in the creation of many files that are unreferenced and 
-never written to, resulting in files with a header but no data. Recovery tries to 
-handle these as "empty" files which slows the recovery process. This was fixed in [#823] [#845].  
++ tserver failures could result in many empty WALs that unnecessarily slow recovery.
+This was fixed in [#823] [#845].
 
-+ When the number of WALs referenced by a tserver exceeds a threshold, the tserver will 
-flush and minor compact the oldest log to limit the number of wal references needed by 
-that tserver. This reduces the number of WALs referenced by many tservers and speeds 
-recovery by reducing the number of WALs needed during recovery.  Addressed by [#854] [#860].
++ Some write patterns caused tservers to unnecessarily reference a lot of WALs,
+which could slow any recovery.  In [#854] [#860] the max WALs referenced was
+limited regardless of the write pattern, avoiding long recovery times.
 
-+ During tablet recovery, filter out logs that do not define the tablet. [#881]   
++ During tablet recovery, filter out logs that do not define the tablet. [#881]
 
-+ If a tserver fails sorting, a marker file is written to the recovery directory. 
-The presence of this marker prevents any subsequent attempts at recovery from succeeding. 
-Fixed by modifying the WAL RecoveryLogReader to handle failed file markers in [#961] [#1048]. 
++ If a tserver fails sorting, a marker file is written to the recovery directory.
+This marker prevents any subsequent recovery attempts from succeeding.
+Fixed by modifying the WAL RecoveryLogReader to handle failed file markers in [#961] [#1048].
 
-+ Use UnsynchronizedBuffer for optimized writeV methods (writeVLong abd writeVInt) 
-in Mutations so that only one write call is necessary to the underlying outputstream. 
-This was mainly done as an attempt to improve WAL performance. It "should" help 
-performance of mutations overall since the Hadoop WriteableUtils method can make 
-multiple write calls per long. [#669]
++ Improve performance of serializing mutations to a WAL by avoiding frequent synchronization. [#669]
 
 ### Multiple Fixes for Compaction Issues
 
-+ Stop locking during compaction.  Compactions were acquiring the tablet lock between each 
-key value. This created unnecessary contention with other operations like scan and 
-bulk imports.  For [#1031] the the synchronization was removed by [#1032].
++ Stop locking during compaction.  Compactions acquired the tablet lock between each
+key value. This created unnecessary contention with other operations like scan and
+bulk imports.  The synchronization was removed [#1031] [#1032].
 
 + Only re-queue compaction when there is activity. [#759]
 
 ### Fix ArrayOutOfBounds error when new files are created (affects all previous versions)
 
-Accumulo maintains a 1-up counter to keep file names and other identifiers
-unique. This counter is padded with 0 characters when used in file names. If
-the counter becomes sufficiently large, the padding code in versions prior to
-1.9.3 causes an out of bounds error.
-
-Most users will not be affected by this bug, since it requires the counter to
-be very large before the error would be seen. Situations which might cause the
-counter to get very large include: having a very old Accumulo cluster that has
-been running for many years, having a very large cluster, having a cluster that
-writes many files very quickly for long periods of time, having altered the
-current value of the counter in ZooKeeper, or if you experience a bug which
-causes the counter value to skip ahead very rapidly.
-
-If you wish to check to see if you are at risk of being impacted by this bug,
-examine the name of RFiles recently created in your system. If you have one or
-more padded 0 characters (after an initial letter), as in I000zy98.rf or
-I0123abc.rf, you are probably at low risk from this bug.
-
-This issue was fixed in pull request [#562]
+If the 7 digit base 36 number used to name files attempted to go to 8 digits,
+then compactions would fail.  This was fixed in [#562].
 
 ### Updated Master Metrics to include FATE metrics.
 
 Added master metrics to provide a snapshot of current FATE operations.  The metrics added:
-+ the number of current FATE transactions in progress, 
++ the number of current FATE transactions in progress,
 + the count of child operations that have occurred on the zookeeper FATE node
-+ a count of zookeeper connection errors when the snapshot is taken.  
++ a count of zookeeper connection errors when the snapshot is taken.
 
-The number of child operations provides a light-weight surrogate for FATE transaction 
+The number of child operations provides a light-weight surrogate for FATE transaction
 progression between snapshots. The metrics are controlled with the following properties:
 
 * master.fate.metrics.enabled - default to _false_ preserve current metric reporting
 * master.fate.metrics.min.update.interval - default to _60s_ - there is a hard limit of 10s.
 
-When enabled, the metrics are published to JMX and can optionally be configured using standard 
+When enabled, the metrics are published to JMX and can optionally be configured using standard
 hadoop metrics2 configuration files.
 
 ### Fixed issues with Native Maps with libstdc++ 8.2 and higher
 
-Versions of libstdc++ 8.2 and higher triggered errors within within the native map code. 
-This release fixes issues [#767], [#769] and ...
+Versions of libstdc++ 8.2 and higher triggered errors within within the native map code.
+This release fixes issues [#767], [#769], {% ghi 1064 %}, and {% ghi 1070 %}.
 
 ### Fixed splitting tablets with files and no data
 
@@ -116,21 +93,20 @@ In [#978] and [#981] scans that wait too long for files now log a message.
 
 ### Fixed race condition in table existence check.
 
-The Accumulo client code that checks if tables exists had a race 
+The Accumulo client code that checks if tables exists had a race
 condition.  The race was fixed in [#768] and [#973]
 
 ### Support running Mini Accumulo using Java 11
 
-Mini Accumulo made some assumptions about classloaders that were no 
-longer true in Java 11.  This caused Mini to fail in Java 11.  In 
+Mini Accumulo made some assumptions about classloaders that were no
+longer true in Java 11.  This caused Mini to fail in Java 11.  In
 [#924] Mini was updated to work with Java 11, while still working
 with Java 7 and 8.
 
 ### Fixed issue with improperly configured Snappy
 
-If snappy was configured for a table and the snappy libraries were not 
-available on the system then this could cause minor compactions to hang
-forever.  In [#920] and [#925] this was fixed and minor 
+If snappy was configured and the snappy libraries were not available then minor
+compactions could hang forever.  In [#920] and [#925] this was fixed and minor
 compactions will proceed when a different compression is configured.
 
 ### Handle bad locality group config.
@@ -142,19 +118,19 @@ inoperative.  This was fixed in [#819] and [#840].
 
 There was a race condition in bulk import that could result in files
 being imported after a bulk import transaction had completed.  In the
-worst case these files were already compacted and garbage collected. 
-This would cause a tablet to have a reference to a file that did not 
+worst case these files were already compacted and garbage collected.
+This would cause a tablet to have a reference to a file that did not
 exists.  No data would have been lost, but it would cause scans to fail.
 The race was fixed in [#800] and [#837]
 
 ### Fixed issue with HostRegexTableLoadBalancer
 
-This addresses an issue when using the HostRegexTableLoadBalancer 
-when the default pool is empty. The load balancer will not assign the tablets at all. 
-Here, we select a random pool to assign the tablets to. This behavior is on by 
-default in the HostRegexTableLoadBalancer but can be disabled via 
+This addresses an issue when using the HostRegexTableLoadBalancer
+when the default pool is empty. The load balancer will not assign the tablets at all.
+Here, we select a random pool to assign the tablets to. This behavior is on by
+default in the HostRegexTableLoadBalancer but can be disabled via
 HostRegexTableLoadBalancer configuration setting
- _table.custom.balancer.host.regex.HostTableLoadBalancer.ALL_  
+ _table.custom.balancer.host.regex.HostTableLoadBalancer.ALL_
  Fixed in [#691] - backported to 1.9 in [#710]
 
 ### Update to libthrift version