You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2020/03/15 06:58:16 UTC

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Hello Adar Dembo,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/15439

to review the following change.


Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................

client: micro-optimizations to reduce CPU and allocations

Various small hot-spots I saw while profiling TSBS workloads which do a
high number of very short scans:

- replace vector<RemoteTabletServer> with boost's small_vector in spots
  in the client where we expect to usually have a very small number of
  qualifying servers.

- Optimize ResourceMetrics to use a dense_hash_map instead of a
  std::map, which is more CPU efficient. Additionally, key the map based
  on StringPieces instead of std::strings, since in our use case we are
  always using strings with eternal lifetime (coming from the protobuf
  Descriptor) here.

- Avoid creating a vector<FieldDescriptor*> when updating resource
  metrics. Instead iterate over the fields using index-based APIs that
  don't allocate.

- Add a move constructor for KuduSchema

- Use make_shared for ColumnSchema shared_ptrs to avoid some atomic ops

This cuts cycles used in tcmalloc in kudu-tsdbd for one workload by about
11%.

Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
---
M src/kudu/client/client-internal.cc
M src/kudu/client/resource_metrics-internal.h
M src/kudu/client/resource_metrics.cc
M src/kudu/client/resource_metrics.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/schema.cc
M src/kudu/client/schema.h
M src/kudu/common/schema.h
8 files changed, 40 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/15439/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 3:

Nothing to add to Andrew's comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 20 Mar 2020 03:52:25 +0000
Gerrit-HasComments: No

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has removed a vote on this change.

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 20 Mar 2020 04:09:10 +0000
Gerrit-HasComments: No

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/15439/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15439/1//COMMIT_MSG@26
PS1, Line 26: - Add a move constructor for KuduSchema
> Where does this actually get used?
KuduSchema::FromSchema which gets called once per scan by ScanConfiguration:

    @     0x7f42f9690339 kudu::client::KuduSchema::KuduSchema()
    @     0x7f42f9690ad1 kudu::client::KuduSchema::FromSchema()
    @     0x7f42f9664940 kudu::client::ScanConfiguration::ScanConfiguration()
    @     0x7f42f967beae kudu::client::KuduScanner::Data::Data()


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h
File src/kudu/client/resource_metrics-internal.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h@47
PS1, Line 47:   void Increment(StringPiece name, int64_t amount);
> nit: might be obvious, but maybe doc that this should only be used when we 
yep, forgot that i hadn't finished cleaning up tihs commit when I sent it.


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h@55
PS1, Line 55:   std::list<std::string> owned_strings_;
> Isn't std::deque generally a better choice over std::list? If not here, cou
went with a dense_hash_set instead


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics.cc
File src/kudu/client/resource_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics.cc@60
PS1, Line 60:   owned_strings_.emplace_back(name);
> This suggests that owned_strings_ should be some sort of deduplicating data
sure, that's fair. I'll use a set. I also want to mark this as deprecated since I don't think clients should be using this API anyway (it should have been private from the beginning)


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.h
File src/kudu/client/schema.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.h@718
PS1, Line 718:   explicit KuduSchema(Schema&& schema);
> How will a pre-C++11 compiler (which includes this header) respond to this?
added an ifdef around it for C++11. I think it's fine for it to just not show up in the header since it's only called by the implementation.

Given this isn't a move constructor, but rather a constructor from a different class, I dont think it needs the move-assignment


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.cc
File src/kudu/client/schema.cc:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.cc@826
PS1, Line 826: (Schema&& 
> Nah this actually makes sense for a move constructor.
I could go either way on this -- this isn't a move constructor because it takes a Schema argument, not a KuduSchema argument. But I think this is more canonical for "move-like" constructors.



-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 18 Mar 2020 22:17:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15439/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15439/1//COMMIT_MSG@26
PS1, Line 26: - Add a move constructor for KuduSchema
Where does this actually get used?


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h
File src/kudu/client/resource_metrics-internal.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h@55
PS1, Line 55:   std::list<std::string> owned_strings_;
Isn't std::deque generally a better choice over std::list? If not here, could you add a comment explaining why?


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics.cc
File src/kudu/client/resource_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics.cc@60
PS1, Line 60:   owned_strings_.emplace_back(name);
This suggests that owned_strings_ should be some sort of deduplicating data structure (like an unordered_set); why should two calls to Increment() with the same value in 'name' yield two different owned strings?


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.h
File src/kudu/client/schema.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.h@718
PS1, Line 718:   explicit KuduSchema(Schema&& schema);
How will a pre-C++11 compiler (which includes this header) respond to this?

Also, should add a move-assignment operator for completeness.


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.cc
File src/kudu/client/schema.cc:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.cc@826
PS1, Line 826: (Schema&& 
> nit: pass by value and move?
Nah this actually makes sense for a move constructor.



-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Mar 2020 07:49:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 4: Verified+1

Test failures seem unrelated (two KUDU-2432 issues, one java test which wouldn't cover client code)


-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 20 Mar 2020 05:41:13 +0000
Gerrit-HasComments: No

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h
File src/kudu/client/resource_metrics-internal.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h@55
PS1, Line 55:   int64_t GetMetric(const std::string& name) const;
> went with a dense_hash_set instead
Er, you went with std::set. Was that intentional? Or did you want dense_hash_set?


http://gerrit.cloudera.org:8080/#/c/15439/2/src/kudu/client/resource_metrics.cc
File src/kudu/client/resource_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/15439/2/src/kudu/client/resource_metrics.cc@29
PS2, Line 29: #include "kudu/client/resource_metrics-internal.h" 
Trailing whitespace here.


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.h
File src/kudu/client/schema.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.h@718
PS1, Line 718: #if __cplusplus >= 201103
> added an ifdef around it for C++11. I think it's fine for it to just not sh
Oh oops, Schema vs. KuduSchema.



-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 18 Mar 2020 23:25:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/client-internal.cc@36
PS3, Line 36: #include <boost/container/vector.hpp>
Do we need this?


http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/resource_metrics-internal.h
File src/kudu/client/resource_metrics-internal.h:

http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/resource_metrics-internal.h@28
PS3, Line 28: #include <sparsehash/dense_hash_set>
Remove? Surprised IWYU didn't complain about this.



-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 19 Mar 2020 23:11:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15439

to look at the new patch set (#4).

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................

client: micro-optimizations to reduce CPU and allocations

Various small hot-spots I saw while profiling TSBS workloads which do a
high number of very short scans:

- replace vector<RemoteTabletServer> with boost's small_vector in spots
  in the client where we expect to usually have a very small number of
  qualifying servers.

- Optimize ResourceMetrics to use a dense_hash_map instead of a
  std::map, which is more CPU efficient. Additionally, key the map based
  on StringPieces instead of std::strings, since in our use case we are
  always using strings with eternal lifetime (coming from the protobuf
  Descriptor) here.

- Avoid creating a vector<FieldDescriptor*> when updating resource
  metrics. Instead iterate over the fields using index-based APIs that
  don't allocate.

- Add a KuduSchema constructor taking a Schema rvalue-reference. This is
  called once per scan in when setting up ScanConfiguration.

- Use make_shared for ColumnSchema shared_ptrs to avoid some atomic ops

This cuts cycles used in tcmalloc in kudu-tsdbd for one workload by about
11%.

Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
---
M src/kudu/client/client-internal.cc
M src/kudu/client/resource_metrics-internal.h
M src/kudu/client/resource_metrics.cc
M src/kudu/client/resource_metrics.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/schema.cc
M src/kudu/client/schema.h
M src/kudu/common/schema.h
8 files changed, 62 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/15439/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15439

to look at the new patch set (#2).

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................

client: micro-optimizations to reduce CPU and allocations

Various small hot-spots I saw while profiling TSBS workloads which do a
high number of very short scans:

- replace vector<RemoteTabletServer> with boost's small_vector in spots
  in the client where we expect to usually have a very small number of
  qualifying servers.

- Optimize ResourceMetrics to use a dense_hash_map instead of a
  std::map, which is more CPU efficient. Additionally, key the map based
  on StringPieces instead of std::strings, since in our use case we are
  always using strings with eternal lifetime (coming from the protobuf
  Descriptor) here.

- Avoid creating a vector<FieldDescriptor*> when updating resource
  metrics. Instead iterate over the fields using index-based APIs that
  don't allocate.

- Add a KuduSchema constructor taking a Schema rvalue-reference. This is
  called once per scan in when setting up ScanConfiguration.

- Use make_shared for ColumnSchema shared_ptrs to avoid some atomic ops

This cuts cycles used in tcmalloc in kudu-tsdbd for one workload by about
11%.

Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
---
M src/kudu/client/client-internal.cc
M src/kudu/client/resource_metrics-internal.h
M src/kudu/client/resource_metrics.cc
M src/kudu/client/resource_metrics.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/schema.cc
M src/kudu/client/schema.h
M src/kudu/common/schema.h
8 files changed, 64 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/15439/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................

client: micro-optimizations to reduce CPU and allocations

Various small hot-spots I saw while profiling TSBS workloads which do a
high number of very short scans:

- replace vector<RemoteTabletServer> with boost's small_vector in spots
  in the client where we expect to usually have a very small number of
  qualifying servers.

- Optimize ResourceMetrics to use a dense_hash_map instead of a
  std::map, which is more CPU efficient. Additionally, key the map based
  on StringPieces instead of std::strings, since in our use case we are
  always using strings with eternal lifetime (coming from the protobuf
  Descriptor) here.

- Avoid creating a vector<FieldDescriptor*> when updating resource
  metrics. Instead iterate over the fields using index-based APIs that
  don't allocate.

- Add a KuduSchema constructor taking a Schema rvalue-reference. This is
  called once per scan in when setting up ScanConfiguration.

- Use make_shared for ColumnSchema shared_ptrs to avoid some atomic ops

This cuts cycles used in tcmalloc in kudu-tsdbd for one workload by about
11%.

Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Reviewed-on: http://gerrit.cloudera.org:8080/15439
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/client/client-internal.cc
M src/kudu/client/resource_metrics-internal.h
M src/kudu/client/resource_metrics.cc
M src/kudu/client/resource_metrics.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/schema.cc
M src/kudu/client/schema.h
M src/kudu/common/schema.h
8 files changed, 62 insertions(+), 12 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Todd Lipcon: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h
File src/kudu/client/resource_metrics-internal.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h@55
PS1, Line 55:   int64_t GetMetric(const std::string& name) const;
> Er, you went with std::set. Was that intentional? Or did you want dense_has
yea I went with set<> instead, because dense_hash_set<> requires calling a 'set_empty_key(string)' method. That method would have required allocating an empty string every time this class was instantiated, etc. And since we expect this set to always be empty, we don't want that extra overhead.



-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 19 Mar 2020 21:23:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h
File src/kudu/client/resource_metrics-internal.h:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/resource_metrics-internal.h@47
PS1, Line 47:   void Increment(StringPiece name, int64_t amount);
nit: might be obvious, but maybe doc that this should only be used when we have some guarantee that the stringpiece will outlive these metrics?


http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.cc
File src/kudu/client/schema.cc:

http://gerrit.cloudera.org:8080/#/c/15439/1/src/kudu/client/schema.cc@826
PS1, Line 826: (Schema&& 
nit: pass by value and move?



-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 15 Mar 2020 09:00:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15439 )

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/client-internal.cc@36
PS3, Line 36: #include <boost/container/vector.hpp>
> Do we need this?
weirdly IWYU wants both of these


http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/resource_metrics-internal.h
File src/kudu/client/resource_metrics-internal.h:

http://gerrit.cloudera.org:8080/#/c/15439/3/src/kudu/client/resource_metrics-internal.h@28
PS3, Line 28: #include <sparsehash/dense_hash_set>
> Remove? Surprised IWYU didn't complain about this.
ah, I bet it didn't because it only defines non-instantiated templates? I seem to recall there is some weirdness about templates and IWYU



-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 20 Mar 2020 03:53:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: micro-optimizations to reduce CPU and allocations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15439

to look at the new patch set (#3).

Change subject: client: micro-optimizations to reduce CPU and allocations
......................................................................

client: micro-optimizations to reduce CPU and allocations

Various small hot-spots I saw while profiling TSBS workloads which do a
high number of very short scans:

- replace vector<RemoteTabletServer> with boost's small_vector in spots
  in the client where we expect to usually have a very small number of
  qualifying servers.

- Optimize ResourceMetrics to use a dense_hash_map instead of a
  std::map, which is more CPU efficient. Additionally, key the map based
  on StringPieces instead of std::strings, since in our use case we are
  always using strings with eternal lifetime (coming from the protobuf
  Descriptor) here.

- Avoid creating a vector<FieldDescriptor*> when updating resource
  metrics. Instead iterate over the fields using index-based APIs that
  don't allocate.

- Add a KuduSchema constructor taking a Schema rvalue-reference. This is
  called once per scan in when setting up ScanConfiguration.

- Use make_shared for ColumnSchema shared_ptrs to avoid some atomic ops

This cuts cycles used in tcmalloc in kudu-tsdbd for one workload by about
11%.

Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
---
M src/kudu/client/client-internal.cc
M src/kudu/client/resource_metrics-internal.h
M src/kudu/client/resource_metrics.cc
M src/kudu/client/resource_metrics.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/schema.cc
M src/kudu/client/schema.h
M src/kudu/common/schema.h
8 files changed, 63 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/15439/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I705aef9f2e57d44f387b58650279130ff329666d
Gerrit-Change-Number: 15439
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>