You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org> on 2017/05/04 22:40:19 UTC
[kudu-CR] [catalog manager] categorization of rw operation failures
David Ribeiro Alves has posted comments on this change.
Change subject: [catalog_manager] categorization of rw operation failures
......................................................................
Patch Set 26:
(25 comments)
http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/integration-tests/catalog_manager_tsk-itest.cc
File src/kudu/integration-tests/catalog_manager_tsk-itest.cc:
PS26, Line 82: We want this to happen as often as possible, but
: // due to truncation issues (double --> int64_t) two TSK keys might be
: // generated with interval less than 1 second, which is not desirable.
not sure what's the intent of this sentence? do you do something to mitigate the "undesirability"?
PS26, Line 88: master_tsk_propagated_by_non_leaders
this flag is a bit cryptic, at first I thought this was referring to tablet servers. not sure it's defined in this patch but master_non_leader_masters_propagate_tsk would be clearer, IMO
PS26, Line 114: builder
: .default_admin_operation_timeout(timeout)
no need to wrap, in fact could you do all of this in the decl line?
Line 126: // signed by the key which hasn't yet reached the tserver. This can
with a key which hasn't yet reached the tserver or non-leader master (or somesuch)
PS26, Line 132: IPKI
can you TODO yourself?
PS26, Line 134: Ideally, (hb_interval_ms_ * 2) would be enough, but due
: // to network latency and multi-process OS, the factor should be greater.
how about: We allow for extra slop beyond the minimum (hb_interval_ms_ * 2) to account for network latency and thread preemption.
PS26, Line 136: SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_ * 10));
these timeouts always come back to bite us in the form of flakyness. could we sleep the minimum and keep retrying for a while (always making sure that we get the expected error and none other)
PS26, Line 137: // std::min<int64_t>(run_time_seconds_ * 1000 / 2, hb_interval_ms_ * 50)));
did you mean to leave this?
PS26, Line 185: //SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_));
left over garbage?
http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:
PS26, Line 471: const
we never declare status const. I know you can here, but I haven't seen that anywhere, so I'd prefer to follow the same style
PS26, Line 484: const
same
PS26, Line 492: LOG(WARNING) << err_msg << "will try next time";
this is not very informative
PS26, Line 494: has left
where did it go? :)
PS26, Line 762: const
same
PS26, Line 777: leadership change in the middle:
: // if the master server loses its leadership role, then there will be an
: // error if trying to write into the system table. If that happens, we do
: // not activate the newly generated CA information since it is no longer
: // relevant. If it was leadership change indeed, the system table should
: // contain the CA certificate information when this master is re-elected
: // next.
numbered bullets or some better way to lay this out. it's hard to parse as it is
PS26, Line 794:
a
PS26, Line 857: LOG(INFO) << "Generated new certificate authority record";
can you provide more info? it'd be good to print a seq no or some identifier
PS26, Line 914: using std::bind;
using declarations go on the top of the file
PS26, Line 919: auto wrapper = [this](std::function<Status()> func,
: const Consensus& consensus,
: int64_t start_term,
: const char* op_description) {
: leader_lock_.AssertAcquiredForWriting();
: const Status s = func();
not really sure what's happening here. can you simplify?
PS26, Line 961: static const char* const kLoadMetaOpDescription =
: "Loading table and tablet metadata into memory";
can you declare this const on the top of the file?
PS26, Line 3231: LOG_WITH_PREFIX(INFO)
: << "Latest config for has opid_index of " << latest_index
: << " while this task has opid_index of "
: << cstate_.config().opid_index() << ". Aborting task.";
spurious reformat
PS26, Line 3285: LOG_WITH_PREFIX(WARNING)
: << "ChangeConfig() failed with leader "
: << target_ts_desc_->ToString()
: << " due to CAS failure. No further retry: "
: << status.ToString();
: MarkFailed();
: break;
: default:
: LOG_WITH_PREFIX(INFO)
: << "ChangeConfig() failed with leader "
: << target_ts_desc_->ToString()
: << " due to error "
: << TabletServerErrorPB::Code_Name(resp_.error().code())
: << ". This operation will be retried. Error detail: "
: << status.ToString();
: break;
spurious reformat
http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:
PS26, Line 358: changed from the initial.
what's the initial
PS26, Line 568: of the object
which object?
PS26, Line 607: load TSK records from the system table and import them into the
: // TokenSigner. After doing that check whether it's time to generate new
: // token signing key; if so, then generate and store the new one in the system
: // catalog table. Besides, purge the expired TSKs from the system table.
can you split this into numbered steps (1, 1.1, 1.2, 2) etc
--
To view, visit http://gerrit.cloudera.org:8080/6170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Gerrit-PatchSet: 26
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes