You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Brock Noland (Code Review)" <ge...@cloudera.org> on 2016/09/18 23:00:09 UTC

[kudu-CR] [tests] fix tests - two cpus and fresh install

Brock Noland has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4446

Change subject: [tests] fix tests - two cpus and fresh install
......................................................................

[tests] fix tests - two cpus and fresh install

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure.

Other various tests fail due to lack of lsof and low resource
limits.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
2 files changed, 4 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>

[kudu-CR] [tests] fix tests - two cpus and fresh install

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: [tests] fix tests - two cpus and fresh install
......................................................................


Patch Set 2:

I don't know much of anything about the codegen portion of Kudu, so I don't feel OK'ing even just bumping up the cache size in the test, especially without any understanding from the patch of why it failed with the lower size. I'm adding Todd to the review, who'll either know about this or know who knows about this.

-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4446

to look at the new patch set (#4).

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................

[tests] fix test which fails with two cpus and document other dependencies

Various tests fail due to lack of lsof and low resource
limits, which are not documented.

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure. The test is
depending on:

1) key skew in the SharedLRUCache to obtain hits
2) the fact that capacity of SharedLRUCache is higher than the capacity
configured if the configured capacity is not divisible by the the number
of CPUs.

For example, the capacity is set here:

  FLAGS_codegen_cache_capacity = 10;

However, if the capacity is not perfectly divisible by the number of CPUs,
actual capacity is slightly higher.

CPU 2 => Capacity 10, 5/shard
CPU 4 => Capacity 12, 3/shard
CPU 8 => Capacity 16, 2/shard

Due to this calculation:

const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;

Additionally, the test depends on key skew. For example, I added some temporary
logging which logged each insert. Let's look at inserts into shard 0. Under the
4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts
in pass 0 resulting in hits on the next pass:

pass: 0
Insert: hash = 460595995, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1

Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits.

pass: 0
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0

AFAICT increasing the capacity of the cache doesn't impact correctness.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
M src/kudu/gutil/sysinfo.cc
3 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc
File docs/installation.adoc:

Line 56:     - Limits nproc and nofile greater than 32768
> I didn't hit nproc, I hit nofile. However, given the low default settings f
Well, looking at a server I'm running which has 406 tablets running, it's using 1036 threads. So, to cross 32k threads you'd probably need 10,000+ tablets, and a lot of other stuff would break first.

As an operator I'd probably recommend a lower nproc ulimit than 65536, because if you set it that high, then a runaway loop which is leaking threads in one process could easily use all the pids on a system and prevent other unrelated processes from starting threads (whereas I'd prefer just Kudu to crash)


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix tests - two cpus and fresh install

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Brock Noland has posted comments on this change.

Change subject: [tests] fix tests - two cpus and fresh install
......................................................................


Patch Set 2:

That does look similar. I can look into this more. It's actually bothering me that I cannot explain exactly why it fails.

-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc
File docs/installation.adoc:

Line 56:     - Limits nproc and nofile greater than 32768
which tests fail with nproc <= 32768? If this is just a test requirement, I dont think it should be in the installation adoc. Plus I'm surprised we use that many threads in any test (perhaps it's a bug)


http://gerrit.cloudera.org:8080/#/c/4446/5/src/kudu/gutil/sysinfo.cc
File src/kudu/gutil/sysinfo.cc:

Line 63: DEFINE_int32(num_cpus, 0, "Override number of CPUs by setting to a value > 0");
can you flag this as advanced? also perhaps say something like 'Override the auto-detected number of CPUs on this system' since you can't actually override the number of CPUs :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix tests - two cpus and fresh install

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Brock Noland has posted comments on this change.

Change subject: [tests] fix tests - two cpus and fresh install
......................................................................


Patch Set 2:

Yep sounds good to me. I haven't fully investigated why the change fixes the test for me.

The cache is sharded by the numbers of cpus. Based on the following comment I was thinking I am thinking I edge case where I didn't see any hits but would with 4-procs.

      // Otherwise, we expect to have gotten some hits.
      // If our cache were a perfect LRU implementation, then we would actually
      // expect 0 hits here as well, since we are accessing the entries in
      // exactly the same order as we inserted them, and thus would evict
      // an entry before we look for it again. But, our LRU cache is sharded
      // so we expect to get some hits on the second time.

based on that plus this comment:

      // Basic test for the CompilationManager code cache.
      // This runs a bunch of compilation tasks and ensures that the cache
      // sometimes hits on the second attempt for the same projection.

I am thinking it's a safe change, but obviously important to verify that is actually a safe change.

-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Brock Noland has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................


Patch Set 5:

Makes sense. I thought /proc/sys/kernel/pid_max was higher on 64bit systems than the 32bit default.

-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4446

to look at the new patch set (#5).

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................

[tests] fix test which fails with two cpus and document other dependencies

Various tests fail due to lack of lsof and low resource
limits, which are not documented.

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure. The test is
depending on:

1) key skew in the SharedLRUCache to obtain hits
2) the fact that capacity of SharedLRUCache is higher than the capacity
configured if the configured capacity is not divisible by the the number
of CPUs.

For example, the capacity is set here:

  FLAGS_codegen_cache_capacity = 10;

However, if the capacity is not perfectly divisible by the number of CPUs,
actual capacity is slightly higher.

CPU 2 => Capacity 10, 5/shard
CPU 4 => Capacity 12, 3/shard
CPU 8 => Capacity 16, 2/shard

Due to this calculation:

const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;

Additionally, the test depends on key skew. For example, I added some temporary
logging which logged each insert. Let's look at inserts into shard 0. Under the
4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts
in pass 0 resulting in hits on the next pass:

pass: 0
Insert: hash = 460595995, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1

Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits.

pass: 0
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0

AFAICT increasing the capacity of the cache doesn't impact correctness.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
M src/kudu/gutil/sysinfo.cc
3 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tests] fix tests - two cpus and fresh install

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4446

to look at the new patch set (#3).

Change subject: [tests] fix tests - two cpus and fresh install
......................................................................

[tests] fix tests - two cpus and fresh install

Various tests fail due to lack of lsof and low resource
limits, which are not documented.

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure. The test is
depending on:

1) key skew in the SharedLRUCache to obtain hits
2) the fact that capacity of SharedLRUCache is higher than the capacity
configured if the configured capacity is not divisible by the the number
of CPUs.

For example, the capacity is set here:

  FLAGS_codegen_cache_capacity = 10;

However, if the capacity is not perfectly divisible by the number of CPUs,
actual capacity is slightly higher.

CPU 2 => Capacity 10, 5/shard
CPU 4 => Capacity 12, 3/shard
CPU 8 => Capacity 16, 2/shard

Due to this calculation:

const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;

Additionally, the test depends on key skew. For example, I added some temporary
logging which logged each insert. Let's look at inserts into shard 0. Under the
4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts
in pass 0 resulting in hits on the next pass:

pass: 0
Insert: hash = 460595995, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1

Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits.

pass: 0
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0

AFAICT increasing the capacity of the cache doesn't impact correctness.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
M src/kudu/gutil/sysinfo.cc
3 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4446/6//COMMIT_MSG
Commit Message:

PS6, Line 16: SharedLRUCache
Nit: this should be ShardedLRUCache. Below too.


PS6, Line 17: 2) the fact that capacity of SharedLRUCache is higher than the capacity
            : configured if the configured capacity is not divisible by the the number
            : of CPUs.
            : 
            : For example, the capacity is set here:
            : 
            :   FLAGS_codegen_cache_capacity = 10;
            : 
            : However, if the capacity is not perfectly divisible by the number of CPUs,
            : actual capacity is slightly higher.
            : 
            : CPU 2 => Capacity 10, 5/shard
            : CPU 4 => Capacity 12, 3/shard
            : CPU 8 => Capacity 16, 2/shard
            : 
            : Due to this calculation:
            : 
            : const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;
I appreciate the detailed explanation here, but I'm having a hard time understanding it without looking at the code too. Maybe you can work in how num_shards is calculated? And make explicit that the actual capacity is per_shard * num_shards?

Also, in each of the examples you've provided, the capacity is perfectly divisible by the number of CPUs. Maybe add some negative examples?


PS6, Line 47: shard's
Nit: shards


http://gerrit.cloudera.org:8080/#/c/4446/6/docs/installation.adoc
File docs/installation.adoc:

Please update the various script examples in this file, as well as the SLES instructions.


http://gerrit.cloudera.org:8080/#/c/4446/6/src/kudu/codegen/codegen-test.cc
File src/kudu/codegen/codegen-test.cc:

Line 373:   FLAGS_codegen_cache_capacity = 20;
This fixes the test for two CPUs, but can we either fix the test to be more robust when the capacity isn't exactly 20, or fix the underlying sharding implementation to always use the provided capacity (maybe we'd need to enforce only certain kinds of capacity values e.g. powers of 2 to do this)?


http://gerrit.cloudera.org:8080/#/c/4446/6/src/kudu/gutil/sysinfo.cc
File src/kudu/gutil/sysinfo.cc:

PS6, Line 64:              "Advanced option. Use at your own risk");
I think what Todd meant by "flag it as advanced" was that you use TAG_FLAG to explicitly tag it.

However, tag_flags is part of the util module, which depends on gutil, so doing so would yield a circular dependency. One way to address it would be as follows:
1) Add a kudu::NumCPUs() method somewhere in the util module.
2) Make it call base::NumCPUs().
3) Define this new flag and handle the override there.
4) Change all existing callers to use kudu::NumCPUs() instead.


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Brock Noland has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4446/6//COMMIT_MSG
Commit Message:

PS6, Line 17: 2) the fact that capacity of SharedLRUCache is higher than the capacity
            : configured if the configured capacity is not divisible by the the number
            : of CPUs.
            : 
            : For example, the capacity is set here:
            : 
            :   FLAGS_codegen_cache_capacity = 10;
            : 
            : However, if the capacity is not perfectly divisible by the number of CPUs,
            : actual capacity is slightly higher.
            : 
            : CPU 2 => Capacity 10, 5/shard
            : CPU 4 => Capacity 12, 3/shard
            : CPU 8 => Capacity 16, 2/shard
            : 
            : Due to this calculation:
            : 
            : const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;
> I appreciate the detailed explanation here, but I'm having a hard time unde
Shoot, my explanation as subpar. I will update, but in each of those examples the configured capacity is 10, but the actual capacity is 10, 12, 16.

More details coming.


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix tests - two cpus and fresh install

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tests] fix tests - two cpus and fresh install
......................................................................


Patch Set 2:

Is this related to KUDU-1590?

-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4446

to look at the new patch set (#6).

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................

[tests] fix test which fails with two cpus and document other dependencies

Various tests fail due to lack of lsof and low resource
limits, which are not documented.

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure. The test is
depending on:

1) key skew in the SharedLRUCache to obtain hits
2) the fact that capacity of SharedLRUCache is higher than the capacity
configured if the configured capacity is not divisible by the the number
of CPUs.

For example, the capacity is set here:

  FLAGS_codegen_cache_capacity = 10;

However, if the capacity is not perfectly divisible by the number of CPUs,
actual capacity is slightly higher.

CPU 2 => Capacity 10, 5/shard
CPU 4 => Capacity 12, 3/shard
CPU 8 => Capacity 16, 2/shard

Due to this calculation:

const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;

Additionally, the test depends on key skew. For example, I added some temporary
logging which logged each insert. Let's look at inserts into shard 0. Under the
4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts
in pass 0 resulting in hits on the next pass:

pass: 0
Insert: hash = 460595995, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1

Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits.

pass: 0
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0

AFAICT increasing the capacity of the cache doesn't impact correctness.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
M src/kudu/gutil/sysinfo.cc
3 files changed, 11 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

Posted by "Brock Noland (Code Review)" <ge...@cloudera.org>.
Brock Noland has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other dependencies
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc
File docs/installation.adoc:

Line 56:     - Limits nproc and nofile greater than 32768
> which tests fail with nproc <= 32768? If this is just a test requirement, I
I didn't hit nproc, I hit nofile. However, given the low default settings for both of these, I was just suggesting we just document a reasonable setting "generally" which is why I placed it here. Happy to move it somewhere else or adjust as needed.

BTW, CM sets both limits to:

Max processes             65536                65536                processes 
Max open files            32768                32768                files  

for everything running as a child process. As such, it might make sense to ensure the most tested configuration is specified here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Brock Noland <br...@phdata.io>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes