You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ab...@apache.org on 2019/01/09 18:11:09 UTC
[kudu] branch master updated (920c21a -> b0086fe)
This is an automated email from the ASF dual-hosted git repository.
abukor pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git.
from 920c21a tablet: clean up MergeIterState API
new 17b5efd KUDU-2656: pass IOContext to ValidateDeltaOrder
new 6103f87 Reduce severity of "Error trying to read ahead of the log" log message
new 5e334de KUDU-2195. Add additional gflag to force sync of consensus metadata
new b0086fe [build] Update repo URL
The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
RELEASING.adoc | 2 +-
build-support/push_to_asf.py | 2 +-
src/kudu/consensus/consensus_meta.cc | 11 +++++++++-
src/kudu/consensus/consensus_queue.cc | 11 +++++-----
src/kudu/tablet/delta_compaction.cc | 4 ++--
src/kudu/tablet/delta_tracker.cc | 21 ++++++++++++------
src/kudu/tablet/delta_tracker.h | 10 ++++++++-
src/kudu/tablet/major_delta_compaction-test.cc | 30 ++++++++++++++++++++++++++
src/kudu/tablet/tablet.cc | 5 +++--
src/kudu/tablet/tablet.h | 3 ++-
10 files changed, 78 insertions(+), 21 deletions(-)
[kudu] 02/04: Reduce severity of "Error trying to read ahead of the
log" log message
Posted by ab...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
abukor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 6103f8786358df1766ba41c8d8dc2feb9c8c6f53
Author: Will Berkeley <wd...@gmail.org>
AuthorDate: Tue Jan 8 15:37:57 2019 -0800
Reduce severity of "Error trying to read ahead of the log" log message
In clusters under load, it is typical to see the ERROR log dominated by
messages like the following:
E1221 09:09:13.869918 143384 consensus_queue.cc:440] T 1d030514317942ec9d7796a8a029dace P a4eea0affa4545879c8988b24d8cb2bb [LEADER]: Error trying to read ahead of the log while preparing peer request: Incomplete: Op with index 6620 is ahead of the local log (next sequential op: 6620). Destination peer: Peer: c0a2e34b708845efb8f090459815272c, Is new: false, Last received: 2995.6619, Next index: 6621, Last known committed idx: 6620, Last exchange result: ERROR, Needs tablet copy: false
This message is logged at the error level, and looks a little bit scary,
but it is in fact harmless. Here's what happens:
1. A leader neglects its duties and the followers elect a new leader.
2. The new leader manages to replicate more ops (usually just the NO_OP
asserting leadership).
3. The leader of the previous term tries to replicate an op to a peer in
the new term.
4. From the response, it founds out that
a. It is in an earlier term, so it should step down and increment its
term.
b. The last op the peer saw is (leader's index + k) for some k > 0
(usually k = 1). So the leader will attempt to send up ops of
index up through (leader's index + k).
5. The term change is asynchronous, and before it happens the leader
tries to prepare a new request to the peer whose log is ahead of the
local log. This causes the ERROR message.
6. The term change happens. The leader steps down, and life goes on.
Note that the leader should also have received a VoteRequest with the
new term and an UpdateConsensus with the new term from the leader. In
general, this message appears only when the leader is under enough
stress to lose its leadership and be unable to service some consensus
RPCs in a timely manner. It is not in an of itself a problem, and it's a
decent indicator of stress on the leader, so I am leaving the message
but downgrading it to INFO level.
See KUDU-1078 for more information about this situation, especially its
previous causes (which were at one time actually harmful).
Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Reviewed-on: http://gerrit.cloudera.org:8080/12185
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Will Berkeley <wd...@gmail.com>
---
src/kudu/consensus/consensus_queue.cc | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/src/kudu/consensus/consensus_queue.cc b/src/kudu/consensus/consensus_queue.cc
index 2c64e1c..c618002 100644
--- a/src/kudu/consensus/consensus_queue.cc
+++ b/src/kudu/consensus/consensus_queue.cc
@@ -696,11 +696,12 @@ Status PeerMessageQueue::RequestForPeer(const string& uuid,
}
if (s.IsIncomplete()) {
// IsIncomplete() means that we tried to read beyond the head of the log
- // (in the future). See KUDU-1078.
- LOG_WITH_PREFIX_UNLOCKED(ERROR) << "Error trying to read ahead of the log "
- << "while preparing peer request: "
- << s.ToString() << ". Destination peer: "
- << peer_copy.ToString();
+ // (in the future). This is usually a sign that this peer is under load
+ // and is about to step down as leader. See KUDU-1078.
+ LOG_WITH_PREFIX_UNLOCKED(INFO) << "Error trying to read ahead of the log "
+ << "while preparing peer request: "
+ << s.ToString() << ". Destination peer: "
+ << peer_copy.ToString();
return s;
}
LOG_WITH_PREFIX_UNLOCKED(FATAL) << "Error reading the log while preparing peer request: "
[kudu] 04/04: [build] Update repo URL
Posted by ab...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
abukor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
commit b0086fe1b9bcc1e22cb5ed134cd5c3ffa9a9e2d7
Author: Attila Bukor <ab...@apache.org>
AuthorDate: Thu Jan 3 14:58:54 2019 +0100
[build] Update repo URL
According to Apache Infra Team the git repo needs to be moved[1] from
git-wip-us.a.o to gitbox.a.o. This commit changes the
build-support/push_to_asf.py script and the releasing docs to use the
new repo URL.
[1] https://lists.apache.org/thread.html/d6db77cb3ba1c03e25e3c3ecd04577ac067b3a2d8678b26203d89b14@%3Cdev.kudu.apache.org%3E
Change-Id: I80132a8ccfa3ca4f7647ec18bf558940399db684
Reviewed-on: http://gerrit.cloudera.org:8080/12150
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
RELEASING.adoc | 2 +-
build-support/push_to_asf.py | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/RELEASING.adoc b/RELEASING.adoc
index 559ac23..4d6839e 100644
--- a/RELEASING.adoc
+++ b/RELEASING.adoc
@@ -50,7 +50,7 @@ in `master`.
----
. Push the branch to public remotes https://github.com/cloudera/kudu.git and
-https://git-wip-us.apache.org/repos/asf/kudu.git. The following example
+https://gitbox.apache.org/repos/asf/kudu.git. The following example
assumes they are called `cloudera` and `apache`.
+
----
diff --git a/build-support/push_to_asf.py b/build-support/push_to_asf.py
index 1752dbe..f8d0ece 100755
--- a/build-support/push_to_asf.py
+++ b/build-support/push_to_asf.py
@@ -42,7 +42,7 @@ import sys
from kudu_util import check_output, confirm_prompt, Colors, get_my_email, init_logging
-APACHE_REPO = "https://git-wip-us.apache.org/repos/asf/kudu.git"
+APACHE_REPO = "https://gitbox.apache.org/repos/asf/kudu.git"
GERRIT_URL = "ssh://<username>@gerrit.cloudera.org:29418/kudu"
GERRIT_URL_RE = re.compile(r"ssh://.+@gerrit.cloudera.org:29418/kudu")
[kudu] 01/04: KUDU-2656: pass IOContext to ValidateDeltaOrder
Posted by ab...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
abukor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 17b5efd4784caf6847f3b786632b79b9893a47c1
Author: Andrew Wong <aw...@cloudera.com>
AuthorDate: Tue Jan 8 00:14:33 2019 -0800
KUDU-2656: pass IOContext to ValidateDeltaOrder
Previously, checksum errors that occurred during calls to the debug-only
function ValidateDeltaOrder() would lead to a DCHECK failure. This patch
plumbs an IOContext to these calls to avoid this and adds a regression
test for such cases.
Change-Id: I3364ad95a5b3608db6538151007c4b6d16500f2b
Reviewed-on: http://gerrit.cloudera.org:8080/12178
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
src/kudu/tablet/delta_compaction.cc | 4 ++--
src/kudu/tablet/delta_tracker.cc | 21 ++++++++++++------
src/kudu/tablet/delta_tracker.h | 10 ++++++++-
src/kudu/tablet/major_delta_compaction-test.cc | 30 ++++++++++++++++++++++++++
src/kudu/tablet/tablet.cc | 5 +++--
src/kudu/tablet/tablet.h | 3 ++-
6 files changed, 60 insertions(+), 13 deletions(-)
diff --git a/src/kudu/tablet/delta_compaction.cc b/src/kudu/tablet/delta_compaction.cc
index ff2e071..475306a 100644
--- a/src/kudu/tablet/delta_compaction.cc
+++ b/src/kudu/tablet/delta_compaction.cc
@@ -421,12 +421,12 @@ Status MajorDeltaCompaction::UpdateDeltaTracker(DeltaTracker* tracker,
// Even if we didn't create any new redo blocks, we still need to update the
// tracker so it removes the included_stores_.
- tracker->AtomicUpdateStores(included_stores_, new_redo_stores, REDO);
+ tracker->AtomicUpdateStores(included_stores_, new_redo_stores, io_context, REDO);
// We only call AtomicUpdateStores() for UNDOs if we wrote UNDOs. We're not
// removing stores so we don't need to call it otherwise.
if (!new_undo_stores.empty()) {
- tracker->AtomicUpdateStores({}, new_undo_stores, UNDO);
+ tracker->AtomicUpdateStores({}, new_undo_stores, io_context, UNDO);
}
return Status::OK();
}
diff --git a/src/kudu/tablet/delta_tracker.cc b/src/kudu/tablet/delta_tracker.cc
index 8c3df30..9b4fd2b 100644
--- a/src/kudu/tablet/delta_tracker.cc
+++ b/src/kudu/tablet/delta_tracker.cc
@@ -204,8 +204,10 @@ string JoinDeltaStoreStrings(const SharedDeltaStoreVector& stores) {
} // anonymous namespace
+#ifndef NDEBUG
Status DeltaTracker::ValidateDeltaOrder(const std::shared_ptr<DeltaStore>& first,
const std::shared_ptr<DeltaStore>& second,
+ const IOContext* io_context,
DeltaType type) {
shared_ptr<DeltaStore> first_copy = first;
shared_ptr<DeltaStore> second_copy = second;
@@ -216,14 +218,14 @@ Status DeltaTracker::ValidateDeltaOrder(const std::shared_ptr<DeltaStore>& first
shared_ptr<DeltaFileReader> first_clone;
RETURN_NOT_OK(down_cast<DeltaFileReader*>(first.get())->CloneForDebugging(
rowset_metadata_->fs_manager(), mem_trackers_.tablet_tracker, &first_clone));
- RETURN_NOT_OK(first_clone->Init(nullptr));
+ RETURN_NOT_OK(first_clone->Init(io_context));
first_copy = first_clone;
}
if (!second_copy->Initted()) {
shared_ptr<DeltaFileReader> second_clone;
RETURN_NOT_OK(down_cast<DeltaFileReader*>(second.get())->CloneForDebugging(
rowset_metadata_->fs_manager(), mem_trackers_.tablet_tracker, &second_clone));
- RETURN_NOT_OK(second_clone->Init(nullptr));
+ RETURN_NOT_OK(second_clone->Init(io_context));
second_copy = second_clone;
}
@@ -244,15 +246,19 @@ Status DeltaTracker::ValidateDeltaOrder(const std::shared_ptr<DeltaStore>& first
return Status::OK();
}
-Status DeltaTracker::ValidateDeltasOrdered(const SharedDeltaStoreVector& list, DeltaType type) {
+Status DeltaTracker::ValidateDeltasOrdered(const SharedDeltaStoreVector& list,
+ const IOContext* io_context,
+ DeltaType type) {
for (size_t i = 1; i < list.size(); i++) {
- RETURN_NOT_OK(ValidateDeltaOrder(list[i - 1], list[i], type));
+ RETURN_NOT_OK(ValidateDeltaOrder(list[i - 1], list[i], io_context, type));
}
return Status::OK();
}
+#endif // NDEBUG
void DeltaTracker::AtomicUpdateStores(const SharedDeltaStoreVector& stores_to_replace,
const SharedDeltaStoreVector& new_stores,
+ const IOContext* io_context,
DeltaType type) {
std::lock_guard<rw_spinlock> lock(component_lock_);
SharedDeltaStoreVector* stores_to_update =
@@ -299,11 +305,12 @@ void DeltaTracker::AtomicUpdateStores(const SharedDeltaStoreVector& stores_to_re
// Make sure the new stores are already ordered. Non-OK Statuses here don't
// indicate ordering errors, but rather, physical errors (e.g. disk
// failure). These don't indicate incorrectness and are thusly ignored.
- WARN_NOT_OK(ValidateDeltasOrdered(new_stores, type), "Could not validate delta order");
+ WARN_NOT_OK(ValidateDeltasOrdered(new_stores, io_context, type),
+ "Could not validate delta order");
if (start_it != stores_to_update->end()) {
// Sanity check that the last store we are adding would logically appear
// before the first store that would follow it.
- WARN_NOT_OK(ValidateDeltaOrder(*new_stores.rbegin(), *start_it, type),
+ WARN_NOT_OK(ValidateDeltaOrder(*new_stores.rbegin(), *start_it, io_context, type),
"Could not validate delta order");
}
}
@@ -343,7 +350,7 @@ Status DeltaTracker::CommitDeltaStoreMetadataUpdate(const RowSetMetadataUpdate&
rowset_metadata_->AddOrphanedBlocks(removed_blocks);
// Once we successfully commit to the rowset metadata, let's ensure we update
// the delta stores to maintain consistency between the two.
- AtomicUpdateStores(to_remove, new_stores, type);
+ AtomicUpdateStores(to_remove, new_stores, io_context, type);
if (flush_type == FLUSH_METADATA) {
// Flushing the metadata is considered best-effort in this function.
diff --git a/src/kudu/tablet/delta_tracker.h b/src/kudu/tablet/delta_tracker.h
index f1bf87a..edac431 100644
--- a/src/kudu/tablet/delta_tracker.h
+++ b/src/kudu/tablet/delta_tracker.h
@@ -195,6 +195,7 @@ class DeltaTracker {
std::vector<std::shared_ptr<DeltaStore>>* stores,
DeltaType type);
+#ifndef NDEBUG
// Validates that 'first' may precede 'second' in an ordered list of deltas,
// given a delta type of 'type'. This should only be run in DEBUG mode.
//
@@ -202,6 +203,7 @@ class DeltaTracker {
// validation could not be performed.
Status ValidateDeltaOrder(const std::shared_ptr<DeltaStore>& first,
const std::shared_ptr<DeltaStore>& second,
+ const fs::IOContext* io_context,
DeltaType type);
// Validates the relative ordering of the deltas in the specified list. This
@@ -209,14 +211,20 @@ class DeltaTracker {
//
// Crashes if there is an ordering violation, and returns an error if the
// validation could not be performed.
- Status ValidateDeltasOrdered(const SharedDeltaStoreVector& list, DeltaType type);
+ Status ValidateDeltasOrdered(const SharedDeltaStoreVector& list,
+ const fs::IOContext* io_context,
+ DeltaType type);
+#endif // NDEBUG
// Replaces the subsequence of stores that matches 'stores_to_replace' with
// delta file readers corresponding to 'new_delta_blocks', which may be empty.
// If 'stores_to_replace' is empty then the stores represented by
// 'new_delta_blocks' are prepended to the relevant delta stores list.
+ //
+ // In DEBUG mode, this may do IO to validate the delta ordering.
void AtomicUpdateStores(const SharedDeltaStoreVector& stores_to_replace,
const SharedDeltaStoreVector& new_stores,
+ const fs::IOContext* io_context,
DeltaType type);
// Get the delta MemStore's size in bytes, including pre-allocation.
diff --git a/src/kudu/tablet/major_delta_compaction-test.cc b/src/kudu/tablet/major_delta_compaction-test.cc
index 84671a2..69f5c07 100644
--- a/src/kudu/tablet/major_delta_compaction-test.cc
+++ b/src/kudu/tablet/major_delta_compaction-test.cc
@@ -23,6 +23,7 @@
#include <unordered_set>
#include <vector>
+#include <gflags/gflags_declare.h>
#include <glog/logging.h>
#include <gtest/gtest.h>
@@ -30,6 +31,7 @@
#include "kudu/common/iterator.h"
#include "kudu/common/partial_row.h"
#include "kudu/common/schema.h"
+#include "kudu/fs/io_context.h"
#include "kudu/gutil/gscoped_ptr.h"
#include "kudu/gutil/port.h"
#include "kudu/gutil/stringprintf.h"
@@ -42,6 +44,8 @@
#include "kudu/util/status.h"
#include "kudu/util/test_macros.h"
+DECLARE_double(cfile_inject_corruption);
+
using std::shared_ptr;
using std::string;
using std::unordered_set;
@@ -176,6 +180,32 @@ class TestMajorDeltaCompaction : public KuduRowSetTest {
vector<ExpectedRow> expected_state_;
};
+// Regression test for KUDU-2656, wherein a corruption during a major delta
+// compaction would lead to a failure in debug mode.
+TEST_F(TestMajorDeltaCompaction, TestKudu2656) {
+ constexpr int kNumRows = 100;
+ NO_FATALS(WriteTestTablet(kNumRows));
+ ASSERT_OK(tablet()->Flush());
+ vector<shared_ptr<RowSet>> all_rowsets;
+ tablet()->GetRowSetsForTests(&all_rowsets);
+ ASSERT_FALSE(all_rowsets.empty());
+ shared_ptr<RowSet> rs = all_rowsets.front();
+ // Create some on-disk deltas.
+ NO_FATALS(UpdateRows(kNumRows, /*even=*/false));
+ ASSERT_OK(tablet()->FlushBiggestDMS());
+
+ // Major compact some columns.
+ vector<ColumnId> col_ids = { schema_.column_id(1),
+ schema_.column_id(3),
+ schema_.column_id(4) };
+
+ // Injecting a failure should result in an error, not a crash.
+ FLAGS_cfile_inject_corruption = 1.0;
+ fs::IOContext io_context({ "test-tablet" });
+ Status s = tablet()->DoMajorDeltaCompaction(col_ids, rs, &io_context);
+ ASSERT_TRUE(s.IsCorruption()) << s.ToString();
+}
+
// Tests a major delta compaction run.
// Verifies that the output rowset accurately reflects the mutations, but keeps the
// unchanged columns intact.
diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc
index 0d36f4c..40b728f 100644
--- a/src/kudu/tablet/tablet.cc
+++ b/src/kudu/tablet/tablet.cc
@@ -1083,10 +1083,11 @@ void Tablet::AtomicSwapRowSetsUnlocked(const RowSetVector &to_remove,
}
Status Tablet::DoMajorDeltaCompaction(const vector<ColumnId>& col_ids,
- const shared_ptr<RowSet>& input_rs) {
+ const shared_ptr<RowSet>& input_rs,
+ const IOContext* io_context) {
RETURN_IF_STOPPED_OR_CHECK_STATE(kOpen);
Status s = down_cast<DiskRowSet*>(input_rs.get())
- ->MajorCompactDeltaStoresWithColumnIds(col_ids, nullptr, GetHistoryGcOpts());
+ ->MajorCompactDeltaStoresWithColumnIds(col_ids, io_context, GetHistoryGcOpts());
return s;
}
diff --git a/src/kudu/tablet/tablet.h b/src/kudu/tablet/tablet.h
index 4e3fadc..9853191 100644
--- a/src/kudu/tablet/tablet.h
+++ b/src/kudu/tablet/tablet.h
@@ -385,7 +385,8 @@ class Tablet {
//
// Only used in tests.
Status DoMajorDeltaCompaction(const std::vector<ColumnId>& col_ids,
- const std::shared_ptr<RowSet>& input_rs);
+ const std::shared_ptr<RowSet>& input_rs,
+ const fs::IOContext* io_context = nullptr);
// Calculates the ancient history mark and returns true iff tablet history GC
// is enabled, which requires the use of a HybridClock.
[kudu] 03/04: KUDU-2195. Add additional gflag to force sync of
consensus metadata
Posted by ab...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
abukor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 5e334de24e016192413c23ea2554d49406df1985
Author: Mike Percy <mp...@apache.org>
AuthorDate: Tue Jan 8 17:01:52 2019 -0800
KUDU-2195. Add additional gflag to force sync of consensus metadata
This patch adds an override gflag for consensus metadata fsync so that
XFS users are less likely to lose their consensus metadata files while
voting right before a power outage.
Change-Id: I73212d1670a33479cce7d9ef9ee61cfe9b00cdd3
Reviewed-on: http://gerrit.cloudera.org:8080/12186
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
src/kudu/consensus/consensus_meta.cc | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/kudu/consensus/consensus_meta.cc b/src/kudu/consensus/consensus_meta.cc
index b075134..355aba9 100644
--- a/src/kudu/consensus/consensus_meta.cc
+++ b/src/kudu/consensus/consensus_meta.cc
@@ -45,6 +45,10 @@ DEFINE_double(fault_crash_before_cmeta_flush, 0.0,
"consensus metadata. (For testing only!)");
TAG_FLAG(fault_crash_before_cmeta_flush, unsafe);
+DEFINE_bool(cmeta_force_fsync, false,
+ "Whether fsync() should be called when consensus metadata files are updated");
+TAG_FLAG(cmeta_force_fsync, advanced);
+
namespace kudu {
namespace consensus {
@@ -241,7 +245,12 @@ Status ConsensusMetadata::Flush(FlushMode flush_mode) {
// essentially an extension of the primary durability mechanism of the
// consensus subsystem: the WAL. Using the same flag ensures that the WAL
// and the consensus metadata get the same durability guarantees.
- FLAGS_log_force_fsync_all ? pb_util::SYNC : pb_util::NO_SYNC),
+ // We add FLAGS_cmeta_force_fsync to support an override in certain
+ // cases. Some filesystems such as ext4 are more forgiving to omitting an
+ // fsync() due to periodic commit with default settings, whereas other
+ // filesystems such as XFS will not commit as often and need the fsync to
+ // avoid significant data loss when a crash happens.
+ FLAGS_log_force_fsync_all || FLAGS_cmeta_force_fsync ? pb_util::SYNC : pb_util::NO_SYNC),
Substitute("Unable to write consensus meta file for tablet $0 to path $1",
tablet_id_, meta_file_path));
RETURN_NOT_OK(UpdateOnDiskSize());