You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yingchun Lai (Code Review)" <ge...@cloudera.org> on 2019/10/18 10:09:30 UTC

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Yingchun Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14507


Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and change 'live_row_count' to
'assured_live_row_count' on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
15 files changed, 108 insertions(+), 46 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 1
Gerrit-Owner: Yingchun Lai <40...@qq.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc@5587
PS12, Line 5587: // The new reported stat doesn't support 'liv
> It will insert tablet uuid into std::unordered_set for the tablet which doe
But doesn't every tablet report that has a ReportedTabletStatsPB have an on_disk_size()?

Let's assume we're talking about a table where none of the tablets support live row counting. I'm working my way through the layers involved trying to figure this out:
1. The tserver's heartbeating thread will call TabletReplica::UpdateTabletStats periodically.
2. Each call recalculates the on_disk_size and live_row_count (which will fail). If on_disk_size has changed, the replica's cached stats are updated. Also if the replica is a LEADER, it'll be included in the next tablet report. Note: the cached stats begin as an uninitialized PB.
3. When a heartbeat is about to include a tablet, TsTabletManager::CreateReportedTabletPB will include the cached stats in the report if the replica is a LEADER and if on_disk_size exists in the cached stats. That second condition is guaranteed to be true because there's no situation in which on_disk_size has not been calculated at least once by TabletReplica::UpdateTabletStats and stored in the cached stats.
4. The master receives the heartbeat and unpacks the tablet reports. If a report includes stats and the replica was a LEADER, we updated the corresponding table metrics. Remember step #3: the on_disk_size is _always_ set, which means a LEADER replica will always have stats in its report.
5. Now we're in TabletInfo::UpdateMetrics and we're trying to figure out whether to use 'else' or 'else if (!old_stats.has_on_disk_size())' in the condition on L5586.

Okay, now I see what you're saying. I got confused and thought you were advocating for 'else if (!new_stats.has_on_disk_size())'. But you were suggesting to check old_stats, and that's a useful check because it'll constrain the AddTabletNoLiveRowCount call to only happen the one time when we transition from "uninitialized stats" to "has_on_disk_size()" (i.e. the first heartbeat).



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 13
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 04:48:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/11/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

PS11: 
Shouldn't snapshot() also pass the value of m_invalid_for_merge_ through to the copy?



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 11
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 22 Oct 2019 17:18:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/catalog_manager.cc@2937
PS8, Line 2937:   resp->set_on_disk_size(table->GetMetrics()->on_disk_size->value());
              :   if (table->GetMetrics()->TableSupportsLiveRowCount()) {
              :     resp->set_live_row_count(table->GetMetrics()->live_row_count->value());
              :   }
              : 
              :   if (FLAGS_mock_table_metrics_for_testing) {
              :     resp->set_on_disk_size(FLAGS_on_disk_size_for_testing);
              :     resp->set_live_row_count(FLAGS_live_row_count_for_testing);
              :   }
              : 
> Maybe clearer as:
Done


http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/table_metrics.cc
File src/kudu/master/table_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/table_metrics.cc@34
PS8, Line 34:     "There may be some tablets of the table that not support live row counting, "
            :     "the result is exact if TableSupportsLiveRowCount() return true.");
> This second sentence isn't particularly helpful because how could a person 
Done


http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/diskrowset.cc@761
PS8, Line 761:   DCHECK_GE(*count, 0);
> This is now always true. We should add a different underflow check prior to
Done


http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/memrowset.h@260
PS8, Line 260:     DCHECK_GE(*count, 0);
> True by default.
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@723
PS6, Line 723:   virtual scoped_refptr<Metric> snapshot() const = 0;
> I made a mistake in my earlier comment: there's no guarantee that all metri
Thanks for your clarification.
Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 8
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 22 Oct 2019 10:15:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 1:

Prior discussion on https://gerrit.cloudera.org/c/14484/


-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 1
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 10:55:30 +0000
Gerrit-HasComments: No

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc@5587
PS12, Line 5587: metrics_->AddTabletNoLiveRowCount(tablet_id);
This could help to alleviate a large number of redundant inserts for the tablets that doesn't support LiveRowCount:

if (!old_stats.IsInitialized()) {
  metrics_->AddTabletNoLiveRowCount(tablet_id);
}



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 12
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 02:00:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#8).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
26 files changed, 220 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 8
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#3).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and change 'live_row_count' to
'assured_live_row_count' on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
16 files changed, 111 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 3
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/master/table_metrics.cc
File src/kudu/master/table_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/master/table_metrics.cc@28
PS1, Line 28:     kudu::MetricUnit::kBytes,
> Why rename this? It hasn't shipped yet, so we can safely change its semanti
Done


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/master/table_metrics.cc@31
PS1, Line 31: METRIC_DEFINE_gauge_uint64(table, live_row_count, "Table Live Row count",
            :     kudu::MetricUnit::kRows,
> I think this is a bad idea because clients won't know when this happens and
Table metrics aggregation on master is base on heartbeats from tservers, and tservers heartbeats are unordered and asynchronous, table metrics are exposed according to the current received tablets info reported in heartbeats, so master may not see the complete picture of the table when master or tserver restart, these table metrics may incorrect that time, if we can accept this issuse, I think we can do it as you said.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto@199
PS1, Line 199:   required uint64 on_disk_size = 1;
> Technically both of these should be optional; we should always use optional
The new added ReportedTabletStatsPB in ReportedTabletPB is optional, but here in ReportedTabletStatsPB, on_disk_size will always apear in any case, can we keep it as required?


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet.cc@1889
PS1, Line 1889:   scoped_refptr<TabletComponents> comps;
              :   GetComponentsOrNull(&comps);
              :   i
> Should probably be a DCHECK now; it's expected that callers ensure Supports
SupportsLiveRowCount is removed, so shouldn't DCHECK now.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc@824
PS1, Line 824: 
> So we can get a 0 in one of three ways:
Done.
When the metric is shut down, now 0 is exposed, I think it's OK for dead tablet.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.h@775
PS1, Line 775:   // Set 'm_invalid_for_merge_', causing this metric to be invalid for merge.
> Too long, need to wrap.
Done


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.h@794
PS1, Line 794: 
> So as far as I can tell this is true when:
I want to seperate it from 'merged_entities_count_of_tablet', which is also instantiated as hidden but need to do merge, however, the prior implement has some problems, now, I've updated the code.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.cc
File src/kudu/util/metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.cc@775
PS1, Line 775:   UpdateModificationEpoch();
> Oops, I guess we should have had this here from day one?
Yes, however, this MeanGauge type metric is added after 1.10



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Sat, 19 Oct 2019 15:06:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#6).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
18 files changed, 210 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#7).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
25 files changed, 213 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 7
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Reviewed-on: http://gerrit.cloudera.org:8080/14507
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
26 files changed, 269 insertions(+), 110 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 14
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 12:

> (1 comment)

^_^


-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 12
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 04:58:14 +0000
Gerrit-HasComments: No

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 6:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.h@325
PS6, Line 325:   // Update the metrics.
> "Update stats belonging to 'tablet_id' in the table's metrics.'
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.h@330
PS6, Line 330:   // Remove the metrics.
> "Remove stats belonging to 'tablet_id' from table metrics.'
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.cc@4248
PS6, Line 4248:       DCHECK(ts_desc->permanent_uuid() == report.consensus_state().leader_uuid());
> While you're here, could you change this to a DCHECK_EQ?
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h
File src/kudu/master/table_metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h@51
PS6, Line 51:   enum class PartialSupportMetricType {
            :     kLiveRowCount = 0
            :   };
            :   void AddTablet(PartialSupportMetricType type, const std::string& tablet_id);
            :   void RemoveTablet(PartialSupportMetricType type, const std::string& tablet_id);
            :   bool ContainsTablet(PartialSupportMetricType type, const std::string& tablet_id) const;
            :   bool ValidMetric(PartialSupportMetricType type) const;
> I like this approach to keeping track of tablets that don't support live ro
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h@61
PS6, Line 61:   // Tablet ids which not support to report 'live_row_count'.
> "IDs of tablets which do not support reporting live row count."
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h@62
PS6, Line 62:   std::unordered_set<std::string> not_support_live_row_count_tablet_ids_;
> Maybe tablet_ids_no_live_row_count_?
Done


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto@199
PS1, Line 199:   required uint64 on_disk_size = 1;
> This is a somewhat ideological issue, but the summary is that in proto3, 'r
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/rowset.h@167
PS6, Line 167:   // TODO(yingchun): change int64_t to uint64_t
> Will you be handling this in a follow-up patch?
Done in this patch.


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h@303
PS6, Line 303:   // Count live rows of this tablet replica.
             :   //
             :   // Returns a non-ok status if failed.
> "Counts the number of live rows in this tablet replica.
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h@308
PS6, Line 308:   // Return live rows count of this tablet replica, used for metric only.
             :   // Return 0 if failed.
> "Like CountLiveRows but returns 0 on failure."
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h@310
PS6, Line 310:   uint64_t CountLiveRowsForMetric() const;
> How about renaming to CountLiveRowsNoFail? That it's only used for metrics 
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@723
PS6, Line 723:   virtual scoped_refptr<Metric> snapshot() const = 0;
> Does it matter that snapshot() doesn't preserve the source's values of m_ep
InvalidateEpoch in InvalidateIfNeededInMerge is aim to hide itself if it was a snapshot of a 'valid' metric and merge with an 'invalid' metric. If we hide it when snapshot(set m_epoch_ -1, do you mean this?), merge will not execute when merge, and the last result is hiden too.


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@775
PS6, Line 775:   // Set 'm_invalid_for_merge_', causing this metric to be invalid for merge.
> The implementation is obvious from inspection, so let's make the comment hi
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@780
PS6, Line 780:   // If one of the two metrics is invalid for merge, the result will be invalidate and
             :   // return true.
> "Returns whether the merge of 'other' into this metric should be prohibited
Done


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@819
PS6, Line 819:   // Whether this metric is invalid for merge.
             :   bool m_invalid_for_merge_ = false;
> Why do we need this new per-metric state? You alluded to some problems with
I've ever added a patch 59c2de495a16f6c252196e91a9b4beeba28ed969 to count merged entities when merge metrics, e.g. to count how many tablets of a table have on a server, this type of metric is also init as hiden, but should expose when merge, that is different with live_row_count which may init as hiden (for the tablet which not support live row counting) but shouldn't expose when merge. For the latter, m_invalid_for_merge_ should be true.


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@951
PS6, Line 951:     auto gauge = Instantiate(entity, initial_value);
             :     gauge->InvalidateEpoch();
> Could replace these two lines with a call to InstantiateHidden.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 21 Oct 2019 08:38:33 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#4).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
18 files changed, 212 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#5).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
18 files changed, 210 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 5
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 8:

(5 comments)

The UBSAN failures look real.

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/catalog_manager.cc@2937
PS8, Line 2937:   resp->set_on_disk_size(table->GetMetrics()->on_disk_size->value());
              :   if (table->GetMetrics()->TableSupportsLiveRowCount()) {
              :     resp->set_live_row_count(table->GetMetrics()->live_row_count->value());
              :   }
              : 
              :   if (FLAGS_mock_table_metrics_for_testing) {
              :     resp->set_on_disk_size(FLAGS_on_disk_size_for_testing);
              :     resp->set_live_row_count(FLAGS_live_row_count_for_testing);
              :   }
              : 
Maybe clearer as:

  if (PREDICT_FALSE(FLAGS_mock_table_metrics_for_testing)) {
    ...
  } else {
    resp->set_on_disk_size(...)
    if (supports live row count) {
      resp->set_live_row_count(...)
    }
  }


http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/table_metrics.cc
File src/kudu/master/table_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/master/table_metrics.cc@34
PS8, Line 34:     "There may be some tablets of the table that not support live row counting, "
            :     "the result is exact if TableSupportsLiveRowCount() return true.");
This second sentence isn't particularly helpful because how could a person looking at the metrics know whether TableSupportsLiveRowCount() is true or not?

I think it'd be more accurate to say "Only accurate if all tablets in the table support live row counting."


http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/diskrowset.cc@761
PS8, Line 761:   DCHECK_GE(*count, 0);
This is now always true. We should add a different underflow check prior to doing the subtraction.


http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/14507/8/src/kudu/tablet/memrowset.h@260
PS8, Line 260:     DCHECK_GE(*count, 0);
True by default.


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@723
PS6, Line 723:   virtual scoped_refptr<Metric> snapshot() const = 0;
> InvalidateEpoch in InvalidateIfNeededInMerge is aim to hide itself if it wa
I made a mistake in my earlier comment: there's no guarantee that all metrics with the same prototype will have the same values of m_epoch_ and m_invalid_for_merge_. In fact, they're likely to be different (e.g. one tablet that supports live row counting and another table that doesn't).

But, what I meant was to set m_epoch_ equal to whatever it was from the source of the snapshot(). It's a little weird that snapshot() advertises cloning (or almost deep copying) a metric, but then doesn't copy out all of its inherited state.



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 8
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 22 Oct 2019 04:30:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/13/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/13/src/kudu/master/catalog_manager.cc@5587
PS13, Line 5587:       // The new reported stat doesn't support 'live_row_count' and
               :       // this is the first report stat of the tablet.
I asked about this in PS12: if there is a ReportedTabletStatsPB in the tablet report, isn't on_disk_size always set?



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 13
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 03:52:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#13).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
26 files changed, 269 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/13
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 13
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 14:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@9
PS13, Line 9: an
a


http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@9
PS13, Line 9: upgrade
upgrading


http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@9
PS13, Line 9: add
adding


http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@9
PS13, Line 9: some
drop 'some'


http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@10
PS13, Line 10: hold
contain


http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@10
PS13, Line 10: some
drop 'some'


http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@11
PS13, Line 11: together with some ones that not support
and legacy tablets which don't support it


http://gerrit.cloudera.org:8080/#/c/14507/13//COMMIT_MSG@15
PS13, Line 15: This patch add feature to validate metric when happend to see an invalid
             : metric when MergeFrom, and disable 'live_row_count' metric of a table
             : when any tablet of a table not support live rows counting on master.
This patch makes the live row count metric for a table enabled only if all tablets of the table support live row counting.


http://gerrit.cloudera.org:8080/#/c/14507/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14507/14//COMMIT_MSG@7
PS14, Line 7: [metrics] Hide metric live_row_count when tablet not support
How did you test this?  Maybe I'm missing something, but I don't see a new test added to verify the new functionality.

If indeed there is no test, how hard is it to add one?


http://gerrit.cloudera.org:8080/#/c/14507/13/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/13/src/kudu/util/metrics.h@773
PS13, Line 773: .
nit: drop the trailing period


http://gerrit.cloudera.org:8080/#/c/14507/13/src/kudu/util/metrics.h@779
PS13, Line 779: also
nit: Also



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 14
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 17:00:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14507/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/3/src/kudu/master/catalog_manager.cc@5577
PS3, Line 5577: metrics_) {
> Once Adar told me that the value is 0 even if it is not set in the pb, so t
Done


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc@812
PS1, Line 812: ptr<Tablet> tablet;
> It't not safe to visit 'tablet_' directly since it will be nullptr.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 5
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Sat, 19 Oct 2019 16:33:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/11/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

PS11: 
> Shouldn't snapshot() also pass the value of m_invalid_for_merge_ through to
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 12
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 00:36:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#9).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
26 files changed, 243 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 9
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 13: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc@5587
PS12, Line 5587: // The new reported stat doesn't support 'liv
> Sorry, should be:
Under what circumstances will a tablet report include stats but not the on disk size? Maybe provide an example or two of how that would help?



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 13
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 03:50:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#2).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and change 'live_row_count' to
'assured_live_row_count' on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
16 files changed, 111 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 2
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14507/1//COMMIT_MSG@9
PS1, Line 9: When upgrade an tserver from version older than 1.11, and add some
           : partitions for an existing table, then the table will hold some tablets
           : that support live row counting together with some ones that not support
The live row counting is base on the newly created tablet, but not for the old ones. So, yes, the aggregated live row counting is not exact at the beginning for the old table which is range partitioned. But, the historical range partitions will be dropped as time going on. That means the number will be correct after all of the old partitions are deleted. If we can't accept this time window, I think we need to add a new flag for SysTablesEntryPB in the master.proto to indicate which table supports this feature and deliver it to the newly create tablets to report stats. What do you think?



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 1
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 12:24:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#10).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
26 files changed, 248 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 10
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#11).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
26 files changed, 248 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/11
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 11
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/14507/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14507/1//COMMIT_MSG@9
PS1, Line 9: When upgrade an tserver from version older than 1.11, and add some
           : partitions for an existing table, then the table will hold some tablets
           : that support live row counting together with some ones that not support
> The live row counting is base on the newly created tablet, but not for the 
Can't we figure that out at runtime using the aggregated input from all tablets? Why do we need persistent state for it? If even one tablet doesn't support live row counting, we disable the feature for the table. The moment the last holdout is dropped, we can safely aggregate the live row count and report it to clients.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/master/table_metrics.cc
File src/kudu/master/table_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/master/table_metrics.cc@28
PS1, Line 28: METRIC_DEFINE_gauge_uint64(table, assured_live_row_count, "Assured Table Live Row count",
Why rename this? It hasn't shipped yet, so we can safely change its semantics while retaining the existing name.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/master/table_metrics.cc@31
PS1, Line 31:     "Partial tablets of the table may not support live row counting, we only sum up the "
            :     "tablets which support.");
I think this is a bad idea because clients won't know when this happens and will just see a very odd live row count. I'd rather aggregation be black or white: if all tablets support live row counting, we'll aggregate it and provide it as part of a table's metrics. Otherwise, we don't aggregate or publish it.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto@199
PS1, Line 199:   required uint64 on_disk_size = 1;
Technically both of these should be optional; we should always use optional for new fields.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet.cc@1889
PS1, Line 1889:   if (!metadata_->supports_live_row_count()) {
              :     return Status::NotSupported("This tablet doesn't support live row counting");
              :   }
Should probably be a DCHECK now; it's expected that callers ensure SupportsLiveRowCount is true first, right?


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc@824
PS1, Line 824:     ignore_result(tablet->CountLiveRows(&ret));
So we can get a 0 in one of three ways:
1. The live row count is actually 0.
2. CountLiveRows returned a RuntimeError.
3. tablet was null.

Ultimately we only want to report the live row count in the first case, right? Seems like we need to restructure the code further to do that:

- Tablet needs only one method to get the live row count. It returns a bad Status if it doesn't support it, or if the tablet was shut down. So basically, the status quo before your change.
- TabletReplica also needs just one method to get the live row count. However, that method should return a bad Status if Tablet::CountLiveRows returned a bad Status, and another bad Status if the tablet is null.
- UpdateTabletStats calls TabletReplica::CountLiveRows but skips set_live_row_count() if there was a bad Status.
- The metrics change (to check whether we support live row counting and instantiate accordingly) should remain, but the function binding will need to be adjusted.

What happens to the metric when the tablet is shut down? In theory we should make it hidden, right? Should TabletReplica::Stop InvalidateEpoch on the metric?


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.h@775
PS1, Line 775:   // When any of the metrics is invalid for MergeFrom, the result will be invalidate and
Too long, need to wrap.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.h@794
PS1, Line 794: 
So as far as I can tell this is true when:
1. A metric is no longer hidden, or was never instantiated as hidden, and
2. A metric has been touched at least once, or is of the kind that can't tell whether it's been touched (e.g. a Gauge).

The first part makes sense, but why the second part? We only filter on "touchedness" when dumping the diagnostics log, not in the general case of dumping metrics in the web UI.

If we removed the second part, then the semantics of InvalidateIfNeededInMerge make more sense: if one of the two metrics to be merged is still hidden, the other must be made hidden too.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.cc
File src/kudu/util/metrics.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/util/metrics.cc@775
PS1, Line 775:   UpdateModificationEpoch();
Oops, I guess we should have had this here from day one?



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 3
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 18:32:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc@5587
PS12, Line 5587: // The new reported stat doesn't support 'liv
> Sorry, should be:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 13
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 03:51:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Yifan Zhang, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14507

to look at the new patch set (#12).

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................

[metrics] Hide metric live_row_count when tablet not support

When upgrade an tserver from version older than 1.11, and add some
partitions for an existing table, then the table will hold some tablets
that support live row counting together with some ones that not support.
Tablet which not support live row counting has value of -1, in this
case, metrics merge on tserver and metrics aggregate on master have
problems.
This patch add feature to validate metric when happend to see an invalid
metric when MergeFrom, and disable 'live_row_count' metric of a table
when any tablet of a table not support live rows counting on master.

Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master_path_handlers.cc
M src/kudu/master/table_metrics.cc
M src/kudu/master/table_metrics.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/metrics.cc
M src/kudu/util/metrics.h
26 files changed, 267 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/14507/12
-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 12
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14507/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14507/1//COMMIT_MSG@9
PS1, Line 9: When upgrade an tserver from version older than 1.11, and add some
           : partitions for an existing table, then the table will hold some tablets
           : that support live row counting together with some ones that not support
> Can't we figure that out at runtime using the aggregated input from all tab
All right, thanks for your clarification.


http://gerrit.cloudera.org:8080/#/c/14507/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/3/src/kudu/master/catalog_manager.cc@5577
PS3, Line 5577: uint64_t old_live_row_count = old_stats.has_live_row_count() ? old_stats.live_row_count() : 0;
Once Adar told me that the value is 0 even if it is not set in the pb, so they are the same:
uint64_t old_live_row_count = old_stats.live_row_count();


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto@199
PS1, Line 199:   required uint64 on_disk_size = 1;
> Technically both of these should be optional; we should always use optional
the same.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet.cc@1889
PS1, Line 1889:   if (!metadata_->supports_live_row_count()) {
              :     return Status::NotSupported("This tablet doesn't support live row counting");
              :   }
> Should probably be a DCHECK now; it's expected that callers ensure Supports
the same.


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/tablet_replica.cc@812
PS1, Line 812: tablet_->SupportsLiveRowCount()
It't not safe to visit 'tablet_' directly since it will be nullptr.



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 3
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Sat, 19 Oct 2019 13:14:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc@5587
PS12, Line 5587: metrics_->AddTabletNoLiveRowCount(tablet_id);
> This could help to alleviate a large number of redundant inserts for the ta
Sorry, should be:
if (!old_stats.has_on_disk_size()) {
  metrics_->AddTabletNoLiveRowCount(tablet_id);
}



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 12
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 02:03:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/12/src/kudu/master/catalog_manager.cc@5587
PS12, Line 5587: metrics_->AddTabletNoLiveRowCount(tablet_id);
> Under what circumstances will a tablet report include stats but not the on 
It will insert tablet uuid into std::unordered_set for the tablet which doesn't support LiveRowCount when each report arrives.
Thus, I think it will be better to insert only once. And for the uninitialized stat, there is no on disk size.



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 12
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 23 Oct 2019 04:12:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [metrics] Hide metric live row count when tablet not support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14507 )

Change subject: [metrics] Hide metric live_row_count when tablet not support
......................................................................


Patch Set 6:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.h@325
PS6, Line 325:   // Update the metrics.
"Update stats belonging to 'tablet_id' in the table's metrics.'


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.h@330
PS6, Line 330:   // Remove the metrics.
"Remove stats belonging to 'tablet_id' from table metrics.'


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/catalog_manager.cc@4248
PS6, Line 4248:       DCHECK(ts_desc->permanent_uuid() == report.consensus_state().leader_uuid());
While you're here, could you change this to a DCHECK_EQ?


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h
File src/kudu/master/table_metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h@51
PS6, Line 51:   enum class PartialSupportMetricType {
            :     kLiveRowCount = 0
            :   };
            :   void AddTablet(PartialSupportMetricType type, const std::string& tablet_id);
            :   void RemoveTablet(PartialSupportMetricType type, const std::string& tablet_id);
            :   bool ContainsTablet(PartialSupportMetricType type, const std::string& tablet_id) const;
            :   bool ValidMetric(PartialSupportMetricType type) const;
I like this approach to keeping track of tablets that don't support live row count, and then a table-wide method to figure out if the table itself supports it.

Since we're not actually using the enum for anything (except as descriptive text for these functions), let's lay these out in a hopefully more intuitive way:

- void AddTabletNoLiveRowCount(tablet_id)
- void DeleteTabletNoLiveRowCount(tablet_id)
- bool ContainsTabletNoLiveRowCount(tablet_id)
- bool TableSupportsLiveRowCount()


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h@61
PS6, Line 61:   // Tablet ids which not support to report 'live_row_count'.
"IDs of tablets which do not support reporting live row count."


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/master/table_metrics.h@62
PS6, Line 62:   std::unordered_set<std::string> not_support_live_row_count_tablet_ids_;
Maybe tablet_ids_no_live_row_count_?


http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/14507/1/src/kudu/tablet/metadata.proto@199
PS1, Line 199:   required uint64 on_disk_size = 1;
> The new added ReportedTabletStatsPB in ReportedTabletPB is optional, but he
This is a somewhat ideological issue, but the summary is that in proto3, 'required' fields don't exist anymore, so we should prepare for that by annotating every new field as 'optional'.

More background: https://stackoverflow.com/questions/31801257/why-required-and-optional-is-removed-in-protocol-buffers-3


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/rowset.h@167
PS6, Line 167:   // TODO(yingchun): change int64_t to uint64_t
Will you be handling this in a follow-up patch?


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h@303
PS6, Line 303:   // Count live rows of this tablet replica.
             :   //
             :   // Returns a non-ok status if failed.
"Counts the number of live rows in this tablet replica.

Returns a bad Status on failure."


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h@308
PS6, Line 308:   // Return live rows count of this tablet replica, used for metric only.
             :   // Return 0 if failed.
"Like CountLiveRows but returns 0 on failure."


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/tablet/tablet_replica.h@310
PS6, Line 310:   uint64_t CountLiveRowsForMetric() const;
How about renaming to CountLiveRowsNoFail? That it's only used for metrics will be obvious from the call site.


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h
File src/kudu/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@723
PS6, Line 723:   virtual scoped_refptr<Metric> snapshot() const = 0;
Does it matter that snapshot() doesn't preserve the source's values of m_epoch_ or m_invalid_for_merge_? If it did, could InvalidateIfNeededInMerge be simpler? I think it might: all metrics being merged together would have the same values of m_epoch_ (-1) and m_invalid_for_merge_, so no state would actually have to change. It'd just be a boolean "should we merge or not?" check.


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@775
PS6, Line 775:   // Set 'm_invalid_for_merge_', causing this metric to be invalid for merge.
The implementation is obvious from inspection, so let's make the comment higher level: "Causes this metric to be skipped during a merge."


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@780
PS6, Line 780:   // If one of the two metrics is invalid for merge, the result will be invalidate and
             :   // return true.
"Returns whether the merge of 'other' into this metric should be prohibited. If true, also ensures that this metric is invalidated."


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@819
PS6, Line 819:   // Whether this metric is invalid for merge.
             :   bool m_invalid_for_merge_ = false;
Why do we need this new per-metric state? You alluded to some problems with the previous implementation; could you elaborate on what happened?


http://gerrit.cloudera.org:8080/#/c/14507/6/src/kudu/util/metrics.h@951
PS6, Line 951:     auto gauge = Instantiate(entity, initial_value);
             :     gauge->InvalidateEpoch();
Could replace these two lines with a call to InstantiateHidden.



-- 
To view, visit http://gerrit.cloudera.org:8080/14507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54704d8cbd64a521e65aa3e95bf1a68f7757b7
Gerrit-Change-Number: 14507
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 21 Oct 2019 05:46:22 +0000
Gerrit-HasComments: Yes