You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by mm...@apache.org on 2019/04/04 22:12:26 UTC

[accumulo-website] branch master updated: Update 1.9.3 release notes (#165)

This is an automated email from the ASF dual-hosted git repository.

mmiller pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 598e53c  Update 1.9.3 release notes (#165)
598e53c is described below

commit 598e53c39e4ea724e8fc12b928d27a9309c9152c
Author: Mike Miller <mm...@apache.org>
AuthorDate: Thu Apr 4 18:08:18 2019 -0400

    Update 1.9.3 release notes (#165)
    
    Co-authored-by: Ed Coleman <de...@etcoleman.com>
---
 _posts/release/2018-07-18-accumulo-1.9.3.md | 182 ++++++++++++++--------------
 1 file changed, 88 insertions(+), 94 deletions(-)

diff --git a/_posts/release/2018-07-18-accumulo-1.9.3.md b/_posts/release/2018-07-18-accumulo-1.9.3.md
index ceeda36..9f12d39 100644
--- a/_posts/release/2018-07-18-accumulo-1.9.3.md
+++ b/_posts/release/2018-07-18-accumulo-1.9.3.md
@@ -16,27 +16,52 @@ Users of any previous version of 1.8 or 1.9 are encouraged to upgrade
 
 ### Multiple Fixes for Write Ahead Logs
 
-Fixes issues where a reference may be created in zookeeper and referenced in the metadata table or 
-a large number of empty wal files are created, which can slow recovery.
+This release fixes a number of issues with Write Ahead Logs that slow or prevent recovery 
+and in some cases lead to data loss. The fixes reduce the number of WALS that need to be 
+referenced by a tserver, improve error handing of WAL issues, and improve WAL clean up processes.
 
-Fix improper handling of new WAL failure [#949] [#1005] [#1057]  Fixes issues where a wal that 
-does not exist (create failed or was deleted by a tserver) but the reference was created in 
-zookeeper. The reference to the non-existent wal causes metadata operations to hang until reference
-is removed.
++ Eliminates a race condition that could result in data loss during recovery. 
+In cases where the GC deletes unreferenced WAL files as the master is simultaneously 
+attempting to create the list of WALs necessary for recovery, the master will skip 
+files that should be used in the recovery, resulting in data loss. Fixed in [#866]. 
 
-Fix Unreferenced closed WALs that were never written to [#823] [#845] Fixes issue where failures
-could result in the creation of many files with a header and no data. Recovery treated these 
-as an "empty" files during recovery.
++ If the metadata table has references to a non-existent WAL file, operations 
+for that table will hang. This occurs when files published to zookeeper have existing 
+references but subsequent failures have occurred in the WAL files, or if a tserver 
+removes the file.  Reported in [#949] and fixed in [#1005] [#1057].
 
-Handle many tablets referencing many WALs [#854] [#860] Reduces the number of wals that a tserver
-references, this speeds recovery by reducing the number of wals needed during recovery. When a the
-number of wals referenced by a tserver exceeds a threshold, the tserver will flush and minor 
-compactthe oldest log to limit the number of wal references.   
++ tserver failures could result in the creation of many files that are unreferenced and 
+never written to, resulting in files with a header but no data. Recovery tries to 
+handle these as "empty" files which slows the recovery process. This was fixed in [#823] [#845].  
 
-Fix WAL RecoveryLogReader error on failed file [#961] [#1048] If a tserver fails sorting, a marker 
-file is written to the recovery directory. The presence of this marker prevented any subsequent 
-attempts at recovery from succeeding. 
-(originally [ACCUMULO-4861](https://issues.apache.org/jira/browse/ACCUMULO-4861) 
++ When the number of WALs referenced by a tserver exceeds a threshold, the tserver will 
+flush and minor compact the oldest log to limit the number of wal references needed by 
+that tserver. This reduces the number of WALs referenced by many tservers and speeds 
+recovery by reducing the number of WALs needed during recovery.  Addressed by [#854] [#860].
+
++ During tablet recovery, filter out logs that do not define the tablet. [#881]   
+
++ If a tserver fails sorting, a marker file is written to the recovery directory. 
+The presence of this marker prevents any subsequent attempts at recovery from succeeding. 
+Fixed by modifying the WAL RecoveryLogReader to handle failed file markers in [#961] [#1048]. 
+
++ Use UnsynchronizedBuffer for optimized writeV methods (writeVLong abd writeVInt) 
+in Mutations so that only one write call is necessary to the underlying outputstream. 
+This was mainly done as an attempt to improve WAL performance. It "should" help 
+performance of mutations overall since the Hadoop WriteableUtils method can make 
+multiple write calls per long. [#669]
+
+Moved optimized writeVLong and writeVInt to UnsynchronizedBuffer
+to replace use of WriteableUtils methods. Each optimized method will
+only make one write call to the underlying outputstream.
+
+### Multiple Fixes for Compaction Issues
+
++ Stop locking during compaction.  Compactions were acquiring the tablet lock between each 
+key value. This created unnecessary contention with other operations like scan and 
+bulk imports.  For [#1031] the the synchronization was removed by [#1032].
+
++ Only re-queue compaction when there is activity. [#759]
 
 ### Fix ArrayOutOfBounds error when new files are created (affects all previous versions)
 
@@ -60,49 +85,69 @@ I0123abc.rf, you are probably at low risk from this bug.
 
 This issue was fixed in pull request [#562]
 
+### Updated Master Metrics to include FATE metrics.
+
+Added master metrics to provide a snapshot of current FATE operations.  The metrics added:
++ the number of current FATE transactions in progress, 
++ the count of child operations that have occurred on the zookeeper FATE node
++ a count of zookeeper connection errors when the snapshot is taken.  
+
+The number of child operations provides a light-weight surrogate for FATE transaction 
+progression between snapshots. The metrics are controlled with the following properties:
+
+* master.fate.metrics.enabled - default to _false_ preserve current metric reporting
+* master.fate.metrics.min.update.interval - default to _60s_ - there is a hard limit of 10s.
+
+When enabled, the metrics are published to JMX and can optionally be configured using standard 
+hadoop metrics2 configuration files.
+
+### Fixed issues with Native Maps with libstdc++ 8.2 and higher
+
+Versions of libstdc++ 8.2 and higher triggered errors within within the native map code. 
+This release fixes issues [#767], [#769] and ...
+
 ### Stop locking during compaction
 
 Compactions were acquiring the tablet lock between each key value. 
 This created unnecessary contention with other operations like scan and 
-bulk imports.  For {% ghi 1031 %} the the synchronization was removed
-by {% ghi 1032 %}. 
+bulk imports.  For [#1031] the the synchronization was removed
+by [#1032]. 
 
 ### Fixed splitting tablets with files and no data
 
 The split code assumed that if a tablet had files that it had data in
 those files.  There are some edge case where this is not true.  Updated
-the split code to handle this {% ghi 998 %} {% ghi 999 %} .
+the split code to handle this [#998] [#999].
 
 ### Log when a scan waits a long time for files.
 
 Accumulo has a configurable limit on the max number of files open in a
 tserver for all scans.  When too many files are open, scans must wait.
-In {% ghi 978 %} and {% ghi 981 %} scans that wait too long for files
-now log a message.
+In [#978] and [#981] scans that wait too long for files now log a message.
 
-### Fixed race condition in table existance check.
+### Fixed race condition in table existence check.
 
 The Accumulo client code that checks if tables exists had a race 
-condition.  The race was fixed in {% ghi 768 %} and {% ghi 973 %}
+condition.  The race was fixed in [#768] and [#973]
 
 ### Support running Mini Accumulo using Java 11
 
 Mini Accumulo made some assumptions about classloaders that were no 
 longer true in Java 11.  This caused Mini to fail in Java 11.  In 
-{% ghi 924 %} Mini was updated to work with Java 11, while still working
+[#924] Mini was updated to work with Java 11, while still working
 with Java 7 and 8.
 
 ### Fixed issue with improperly configured Snappy
 
 If snappy was configured for a table and the snappy libraries were not 
 available on the system then this could cause minor compactions to hang
-forever.  In {% ghi 920 %} and {% ghi 925 %} this was fixed and minor 
+forever.  In [#920] and [#925] this was fixed and minor 
 compactions will proceed when a different compression is configured.
 
 ### Handle bad locality group config.
 
 Improperly configured locality groups could cause a tablet to become
-inoperative.  This was fixed in {% ghi 819 %} and {% ghi 840 %}.
+inoperative.  This was fixed in [#819] and [#840].
 
 ### Fixed bulk import race condition.
 
@@ -111,11 +156,22 @@ being imported after a bulk import transaction had completed.  In the
 worst case these files were already compacted and garbage collected. 
 This would cause a tablet to have a reference to a file that did not 
 exists.  No data would have been lost, but it would cause scans to fail.
-The race was fixed in {% ghi 800 %} and {% ghi 837 %}
+The race was fixed in [#800] and [#837]
+
+### Fixed issue with HostRegexTableLoadBalancer
 
-TODO document {% ghi 759 %} {% ghi 710 %} {% ghi 669%}
+This addresses an issue when using the HostRegexTableLoadBalancer 
+when the default pool is empty. The load balancer will not assign the tablets at all. 
+Here, we select a random pool to assign the tablets to. This behavior is on by 
+default in the HostRegexTableLoadBalancer but can be disabled via 
+HostRegexTableLoadBalancer configuration setting
+ _table.custom.balancer.host.regex.HostTableLoadBalancer.ALL_  
+ Fixed in [#691] - backported to 1.9 in [#710]
 
-### ~~Another notable item here (affects versions)~~
+### Update to libthrift version
+
+The packaged, binary  tarball contains updated version of libthrift to version 0.9.3-1 to
+address thrift CVE. Issue [#1029]
 
 ## Useful links
 
@@ -123,74 +179,11 @@ TODO document {% ghi 759 %} {% ghi 710 %} {% ghi 669%}
 * [GitHub] - List of issues tracked on GitHub corresponding to this release
 * [1.9.2 release notes][prev_notes] - Release notes showing changes in the previous release
 
-## ~~Other Changes reference (will drop this list upon release)~~
-
-* 07d6346b1 Fix [#994] Consider mutation queue in example config [#1055]
-* fae5fb29a Fixes [#949] improper handling of new WAL failure [#1005]
-* 0a2457b5b Fix warning - remove unused log field
-* 62886c4e8 Update master metrics to provide FATE operational metrics. [#1046]
-* e92ef4ecf Fixes [#1041] Provides proper volumes to chooser from import table command
-* deafe74ae fixes [#1031] fix concurrency bug and remove unused code [#1032]
-* ba943671a Update checkstyle (CVE-2019-9658)
-* 84907b3ab Fix possible bug and improve errors in Tablet [#1027]
-* e0841b77a Update to libthrift 0.9.3-1 [#1029]
-* d450ef401 Fix accumulo script [#1020]
-* 59e056cb5 fixes [#998] handle splitting tablet with no data and files [#999]
-* b3ff32a8f Fix build in latest Eclipse (for 1.9 branch)
-* a3793e68c Clean up accumulo script
-* acd2d9f73 Improve log4j lookup [#919]
-* 6ec00cf34 Add logging for slow file permits requests. [#978] [#981]
-* 89cfb66db fixes [#768] fix race condition in generation of table map [#973]
-* d7fa411c7 Format the thrift files
-* 739ca1c10 Refactor FATE AdminUtil spliting getStatus method. [#891]
-* cee8fa1c4 Upgrade process should not assume the administrative user is named 'root'. [#944]
-* 61c44919d Support running MiniAccumuloCluster using Java 11 [#924]
-* 787a52884 fixes [#920] retry minc on unsatisfied link error [#925]
-* 577dd9a64 During tablet recovery filter logs out that do not define tablet. [#881]
-* c8b413261 [#859] - Fix CLASSPATH bug causing cur dir to be added [#864]
-* 2c2840b18 Fix race condition when getting WALs for dead tserver [#866]
-* 3862a408b fixes [#819] Handle bad locality group config. [#840]
-* db5f18b1f fixes [#800] avoid importing files for completed bulk transactions [#837]
-* a5e341c89 Fix warnings by adding missing generic types
-* 1be829848 fixes ACCUMULO-4410: Master did not resume balancing after administrative tserver shutdown
-* ba572b4f8 Format C++ code (manually)
-* df1061b79 fixes [#767] correct allocation issue in native maps [#769]
-* 61a6eb8fc Merge branch 'pr-697' into 1.9
-* 447b52d38 Minor clean up from [#766]
-* be6154b5a Include commons-configuration in our convenience binaries for Hadoop 2. [#766]
-* b6409effe Only requeue compaction when there was activity [#759]
-* 1d3578841 Revert commons-vfs2 version [#728]
-* 725a450c8 Make the ACCUMULO_MONITOR_BIND_ALL property case-insensitive checking
-* ff452a486 fixes [#587] In table delete wait for tablets assigned to dead servers [#727]
-* 5de8d0f1e TableName from baseSplit is ignored, when getting inputTableConfig within AbstractInputFormat.initialize() [#711]
-* 32bfafe93 Assign all TabletServers to default pool if empty [#710]
-* 4bd9d6f68 ACCUMULO-4496 Update user manual and error messages for FATE upgrade failure.
-* 987ae65cc Use optimized writeV methods in Mutations [#669]
-* 06d80292d Update commons-configuration to latest 1.x [#659]
-* 159c97a3f Introduce a property for a key password for the Monitor keystore file
-* 6933ddbc3 Revert "Do not require a password on the truststore JKS"
-* a590a0ffe Roll back maven-invoker-plugin 3.1.0->3.0.1
-* c6fe62c9f Remove httpclient dependency. [#655]
-* d83276577 AccumuloFileOutputFormatITs need to look on local filesystem for results written into JUnit provided temp dir. [#654]
-* bc1b55b81 Correct how Continuous Ingest specifies the debug log file [#651]
-* 1213c9a9d Do not require a password on the truststore JKS
-* 9c2ef2f5d Fix javadoc doclint settings
-* bc260dcd6 Upgrade findbugs to spotbugs for 1.9
-* 177680829 Fix [#626] Update to apache-21 parent POM
-* 851c2d133 Fix [#639] Update JUnit usage
-* 25472ae0d Use slf4j syntax for key in log message
-* 96e8828fd Put stacktrace back in visibility exceptions
-* eee579d7c Fix [#596] Fix compilation on Java 7
-* 7fd1f7823 Improve info given in visibility exceptions [#578]
-* bffa3c4e1 Improved comments for [#559] changes [#566]
-
-
-
 ## Upgrading
 
 View the [Upgrading Accumulo documentation][upgrade] for guidance.
 
-## Testing
+## Testing TODO
 
 * (Example) All ITs passed with Hadoop 3.0.0 (hadoop.profile=3)
 * (Example) All ITs passed with Hadoop 2.6.4 (hadoop.profile=2)
@@ -216,6 +209,7 @@ View the [Upgrading Accumulo documentation][upgrade] for guidance.
 [#655]: https://github.com/apache/accumulo/issues/655
 [#659]: https://github.com/apache/accumulo/issues/659
 [#669]: https://github.com/apache/accumulo/issues/669
+[#691]: https://github.com/apache/accumulo/issues/691
 [#710]: https://github.com/apache/accumulo/issues/710
 [#711]: https://github.com/apache/accumulo/issues/711
 [#727]: https://github.com/apache/accumulo/issues/727