You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2019/03/01 23:54:03 UTC

[kudu-CR] KUDU-2645: tablet: Support deduplication of deleted rows in MergeIterator

Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12205 )

Change subject: KUDU-2645: tablet: Support deduplication of deleted rows in MergeIterator
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12205/3/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/12205/3/src/kudu/common/generic_iterators.cc@439
PS3, Line 439:       int cmp;
> Good observation. I think smallest.resize(1, iter.get()) would be more effi
Oops, that resize() doesn't work that way when shrinking the vector, but smallest.clear(); smallest.emplace_back() looks to be fine.


http://gerrit.cloudera.org:8080/#/c/12205/3/src/kudu/common/generic_iterators.cc@441
PS3, Line 441:         // If we have a candidate for smallest row, compare it against the
> My take is that if I use emplace_back(), I don't have to think about whethe
Well, I ran a little test and indeed emplace_back() is faster.

  #include <iostream>
  #include <string>
  #include <vector>

  using std::cerr;
  using std::endl;
  using std::string;
  using std::vector;

  void test_emplace_back(vector<int>& vec, int count) {
    for (int i = 0; i < count; i++) {
      vec.emplace_back(i);
    }
  }

  void test_push_back(vector<int>& vec, int count) {
    for (int i = 0; i < count; i++) {
      vec.push_back(i);
    }
  }

  int main(int argc, const char** argv) {
    if (argc == 1) {
      cerr << "Usage: " << argv[0] << " emplace_back|push_back" << endl;
      return 1;
    }
    string arg = argv[1];

    constexpr int kCount = 100 * 1000 * 1000;

    vector<int> vec;
    vec.reserve(kCount);
    if (arg == "emplace_back") {
      test_emplace_back(vec, kCount);
      return 0;
    }
    if (arg == "push_back") {
      test_push_back(vec, kCount);
      return 0;
    }

    return 1;
  }

Compiled by g++ with -O3 and -std=c++11, I get the following runtime numbers (which are roughly representative):

  $ time ./emplace-test emplace_back

  real    0m0.192s
  user    0m0.093s
  sys     0m0.100s

  $ time ./emplace-test push_back

  real    0m0.280s
  user    0m0.176s
  sys     0m0.104s


http://gerrit.cloudera.org:8080/#/c/12205/3/src/kudu/common/generic_iterators.cc@479
PS3, Line 479:           // We found the single live instance of the row.
> I think you can deduplicate the two calls to CopyRow (one here and the othe
Gotcha, done.


http://gerrit.cloudera.org:8080/#/c/12205/4/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/12205/4/src/kudu/common/generic_iterators.cc@474
PS4, Line 474:       int live_rows_found = 0;
> Could you make sure this doesn't trigger a not-used warning in RELEASE buil
Done


http://gerrit.cloudera.org:8080/#/c/12205/4/src/kudu/common/generic_iterators.cc@503
PS4, Line 503:     unordered_set<MergeIterState*> exhausted;
> I think this can be a simple vector; aren't we guaranteed that there are no
The point of using the set data structure is to get O(1) lookup within the if-clause in std::remove_if()

I'd be surprised if the allocation overhead were significant here because tcmalloc should be using thread-local storage for these relatively small STL data structures.

So I just ran TestMerge 20 times on the current patch with --num-lists 100, and on a version of it after reverting the bulk of the changes in generic_iterators.cc back to the existing master branch code, and the performance was essentially identical for the "real" time measurement, in seconds: existing code = min 0.118, median 0.126, max 0.147; new code = min 0.118, median 0.125, max 0.143

It appears that the additional tcmalloc (non-central freelist) allocation overhead for the vector and unordered_set elements are in the noise compared to the merge process itself, which is fairly CPU intensive.



-- 
To view, visit http://gerrit.cloudera.org:8080/12205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00614b3fa5c6b4e7b620bb78489e24c5ad44daee
Gerrit-Change-Number: 12205
Gerrit-PatchSet: 4
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Fri, 01 Mar 2019 23:54:03 +0000
Gerrit-HasComments: Yes