You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2019/04/05 19:50:00 UTC
[accumulo-website] branch master updated: Reword and shorten 1.9.3
release notes
This is an automated email from the ASF dual-hosted git repository.
kturner pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git
The following commit(s) were added to refs/heads/master by this push:
new dd3c422 Reword and shorten 1.9.3 release notes
dd3c422 is described below
commit dd3c42260ce55a96dcb9b4edc910e26a19d3fc71
Author: Keith Turner <kt...@apache.org>
AuthorDate: Fri Apr 5 15:49:49 2019 -0400
Reword and shorten 1.9.3 release notes
---
_posts/release/2018-07-18-accumulo-1.9.3.md | 112 +++++++++++-----------------
1 file changed, 44 insertions(+), 68 deletions(-)
diff --git a/_posts/release/2018-07-18-accumulo-1.9.3.md b/_posts/release/2018-07-18-accumulo-1.9.3.md
index 5f75777..57739bf 100644
--- a/_posts/release/2018-07-18-accumulo-1.9.3.md
+++ b/_posts/release/2018-07-18-accumulo-1.9.3.md
@@ -16,91 +16,68 @@ Users of any previous version of 1.8 or 1.9 are encouraged to upgrade
### Multiple Fixes for Write Ahead Logs
-This release fixes a number of issues with Write Ahead Logs that slow or prevent recovery
-and in some cases lead to data loss. The fixes reduce the number of WALS that need to be
-referenced by a tserver, improve error handing of WAL issues, and improve WAL clean up processes.
+This release fixes Write Ahead Logs issues that slow or prevent recovery
+and in some cases lead to data loss. The fixes reduce the number of WALS
+referenced by a tserver, improve error handing, and improve clean up.
-+ Eliminates a race condition that could result in data loss during recovery.
-In cases where the GC deletes unreferenced WAL files as the master is simultaneously
-attempting to create the list of WALs necessary for recovery, the master will skip
-files that should be used in the recovery, resulting in data loss. Fixed in [#866].
++ Eliminates a race condition that could result in data loss during recovery.
+If the GC deletes unreferenced WALs from ZK while the master is reading
+recovery WALs from ZK, the master may skip WALs it should not, resulting in
+data loss. Fixed in [#866].
-+ If the metadata table has references to a non-existent WAL file, operations
-for that table will hang. This occurs when files published to zookeeper have existing
-references but subsequent failures have occurred in the WAL files, or if a tserver
-removes the file. Reported in [#949] and fixed in [#1005] [#1057].
++ Opening a new WAL in DFS may fail, but still be advertised in ZK. This could
+result in a missing WAL during recovery, preventing tablets from loading.
+There is no data loss in this case, just WAL references that should not exists.
+Reported in [#949] and fixed in [#1005] [#1057].
-+ tserver failures could result in the creation of many files that are unreferenced and
-never written to, resulting in files with a header but no data. Recovery tries to
-handle these as "empty" files which slows the recovery process. This was fixed in [#823] [#845].
++ tserver failures could result in many empty WALs that unnecessarily slow recovery.
+This was fixed in [#823] [#845].
-+ When the number of WALs referenced by a tserver exceeds a threshold, the tserver will
-flush and minor compact the oldest log to limit the number of wal references needed by
-that tserver. This reduces the number of WALs referenced by many tservers and speeds
-recovery by reducing the number of WALs needed during recovery. Addressed by [#854] [#860].
++ Some write patterns caused tservers to unnecessarily reference a lot of WALs,
+which could slow any recovery. In [#854] [#860] the max WALs referenced was
+limited regardless of the write pattern, avoiding long recovery times.
-+ During tablet recovery, filter out logs that do not define the tablet. [#881]
++ During tablet recovery, filter out logs that do not define the tablet. [#881]
-+ If a tserver fails sorting, a marker file is written to the recovery directory.
-The presence of this marker prevents any subsequent attempts at recovery from succeeding.
-Fixed by modifying the WAL RecoveryLogReader to handle failed file markers in [#961] [#1048].
++ If a tserver fails sorting, a marker file is written to the recovery directory.
+This marker prevents any subsequent recovery attempts from succeeding.
+Fixed by modifying the WAL RecoveryLogReader to handle failed file markers in [#961] [#1048].
-+ Use UnsynchronizedBuffer for optimized writeV methods (writeVLong abd writeVInt)
-in Mutations so that only one write call is necessary to the underlying outputstream.
-This was mainly done as an attempt to improve WAL performance. It "should" help
-performance of mutations overall since the Hadoop WriteableUtils method can make
-multiple write calls per long. [#669]
++ Improve performance of serializing mutations to a WAL by avoiding frequent synchronization. [#669]
### Multiple Fixes for Compaction Issues
-+ Stop locking during compaction. Compactions were acquiring the tablet lock between each
-key value. This created unnecessary contention with other operations like scan and
-bulk imports. For [#1031] the the synchronization was removed by [#1032].
++ Stop locking during compaction. Compactions acquired the tablet lock between each
+key value. This created unnecessary contention with other operations like scan and
+bulk imports. The synchronization was removed [#1031] [#1032].
+ Only re-queue compaction when there is activity. [#759]
### Fix ArrayOutOfBounds error when new files are created (affects all previous versions)
-Accumulo maintains a 1-up counter to keep file names and other identifiers
-unique. This counter is padded with 0 characters when used in file names. If
-the counter becomes sufficiently large, the padding code in versions prior to
-1.9.3 causes an out of bounds error.
-
-Most users will not be affected by this bug, since it requires the counter to
-be very large before the error would be seen. Situations which might cause the
-counter to get very large include: having a very old Accumulo cluster that has
-been running for many years, having a very large cluster, having a cluster that
-writes many files very quickly for long periods of time, having altered the
-current value of the counter in ZooKeeper, or if you experience a bug which
-causes the counter value to skip ahead very rapidly.
-
-If you wish to check to see if you are at risk of being impacted by this bug,
-examine the name of RFiles recently created in your system. If you have one or
-more padded 0 characters (after an initial letter), as in I000zy98.rf or
-I0123abc.rf, you are probably at low risk from this bug.
-
-This issue was fixed in pull request [#562]
+If the 7 digit base 36 number used to name files attempted to go to 8 digits,
+then compactions would fail. This was fixed in [#562].
### Updated Master Metrics to include FATE metrics.
Added master metrics to provide a snapshot of current FATE operations. The metrics added:
-+ the number of current FATE transactions in progress,
++ the number of current FATE transactions in progress,
+ the count of child operations that have occurred on the zookeeper FATE node
-+ a count of zookeeper connection errors when the snapshot is taken.
++ a count of zookeeper connection errors when the snapshot is taken.
-The number of child operations provides a light-weight surrogate for FATE transaction
+The number of child operations provides a light-weight surrogate for FATE transaction
progression between snapshots. The metrics are controlled with the following properties:
* master.fate.metrics.enabled - default to _false_ preserve current metric reporting
* master.fate.metrics.min.update.interval - default to _60s_ - there is a hard limit of 10s.
-When enabled, the metrics are published to JMX and can optionally be configured using standard
+When enabled, the metrics are published to JMX and can optionally be configured using standard
hadoop metrics2 configuration files.
### Fixed issues with Native Maps with libstdc++ 8.2 and higher
-Versions of libstdc++ 8.2 and higher triggered errors within within the native map code.
-This release fixes issues [#767], [#769] and ...
+Versions of libstdc++ 8.2 and higher triggered errors within within the native map code.
+This release fixes issues [#767], [#769], {% ghi 1064 %}, and {% ghi 1070 %}.
### Fixed splitting tablets with files and no data
@@ -116,21 +93,20 @@ In [#978] and [#981] scans that wait too long for files now log a message.
### Fixed race condition in table existence check.
-The Accumulo client code that checks if tables exists had a race
+The Accumulo client code that checks if tables exists had a race
condition. The race was fixed in [#768] and [#973]
### Support running Mini Accumulo using Java 11
-Mini Accumulo made some assumptions about classloaders that were no
-longer true in Java 11. This caused Mini to fail in Java 11. In
+Mini Accumulo made some assumptions about classloaders that were no
+longer true in Java 11. This caused Mini to fail in Java 11. In
[#924] Mini was updated to work with Java 11, while still working
with Java 7 and 8.
### Fixed issue with improperly configured Snappy
-If snappy was configured for a table and the snappy libraries were not
-available on the system then this could cause minor compactions to hang
-forever. In [#920] and [#925] this was fixed and minor
+If snappy was configured and the snappy libraries were not available then minor
+compactions could hang forever. In [#920] and [#925] this was fixed and minor
compactions will proceed when a different compression is configured.
### Handle bad locality group config.
@@ -142,19 +118,19 @@ inoperative. This was fixed in [#819] and [#840].
There was a race condition in bulk import that could result in files
being imported after a bulk import transaction had completed. In the
-worst case these files were already compacted and garbage collected.
-This would cause a tablet to have a reference to a file that did not
+worst case these files were already compacted and garbage collected.
+This would cause a tablet to have a reference to a file that did not
exists. No data would have been lost, but it would cause scans to fail.
The race was fixed in [#800] and [#837]
### Fixed issue with HostRegexTableLoadBalancer
-This addresses an issue when using the HostRegexTableLoadBalancer
-when the default pool is empty. The load balancer will not assign the tablets at all.
-Here, we select a random pool to assign the tablets to. This behavior is on by
-default in the HostRegexTableLoadBalancer but can be disabled via
+This addresses an issue when using the HostRegexTableLoadBalancer
+when the default pool is empty. The load balancer will not assign the tablets at all.
+Here, we select a random pool to assign the tablets to. This behavior is on by
+default in the HostRegexTableLoadBalancer but can be disabled via
HostRegexTableLoadBalancer configuration setting
- _table.custom.balancer.host.regex.HostTableLoadBalancer.ALL_
+ _table.custom.balancer.host.regex.HostTableLoadBalancer.ALL_
Fixed in [#691] - backported to 1.9 in [#710]
### Update to libthrift version