You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2016/11/19 00:55:28 UTC

[kudu-CR] util: add file cache

Hello Dan Burkert, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/5146

to review the following change.

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templatizing the cache to one
particular open file interface vs. the current approach. The templatized
cache looked like spaghetti due to calls between templatized methods and
generic methods, and it required (surprisingly) more casting as the
underlying LRU cache isn't templatized. Plus, it violated principle #3 in
that it made it impossible to use a single cache for multiple open file
types, and thus for different parts of Kudu.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shareds used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,716 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#6).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,511 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#3).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templatizing the cache to one
particular open file interface vs. the current approach. The templatized
cache looked like spaghetti due to calls between templatized methods and
generic methods, and it required (surprisingly) more casting as the
underlying LRU cache isn't templatized. Plus, it violated principle #3 in
that it made it impossible to use a single cache for multiple open file
types, and thus for different parts of Kudu.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shareds used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,683 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 48: class FileCacheTest : public KuduTest {
> That's already tested indirectly in AssertFdsAndDescriptors(). Since NumDes
Sounds good.


http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache.h
File src/kudu/util/file_cache.h:

Line 103:   FileCache(const std::string& cache_name,
> The constructor wasn't actually moving cache_name, so passing by cref elide
but you are using it now, it it could be moved, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] util: add file cache

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 8:

(4 comments)

only 1/2 way through but going to switch to r8 for the rest.

http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-stress-test.cc
File src/kudu/util/file_cache-stress-test.cc:

Line 133:     deque<shared_ptr<FileType>> files;
It took me a bit to figure out the difference between this and the global pool of files -- you may want to add a comment like "Active opened files in this consumer thread"


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test-util.h
File src/kudu/util/file_cache-test-util.h:

Line 68:       CHECK_LE(open_fd_count, max_fd_count);
This should include an error message, since the LOG_EVERY_N may not have fired since the count was less than the max.


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 72:   void AssertFdsAndDescriptors(int num_expected_fds,
at every call site you are offsetting by this->initial_open_fds_, maybe the method could do that automatically?


Line 101:   {
Is this scope necessary?


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 9: Verified-1

Don't merge yet, I've found a race that needs to be addressed.

-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#10).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,628 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5146/2//COMMIT_MSG
Commit Message:

PS2, Line 41: hareds
> shards?
Done


http://gerrit.cloudera.org:8080/#/c/5146/2/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 62:   string::size_type colon = cache_key.find(':');
> Can you rely on the fact that the colon is at position 3 in order to skip t
Yeah, I guess between the DCHECK and the test coverage, that would be safe.


Line 81:   virtual void EvictedEntry(Slice key, Slice value) OVERRIDE {
> remove virtual and change OVERRIDE to override.
Done


Line 343:         + once_.memory_footprint_excluding_this()
> operator on preceding line
Done


http://gerrit.cloudera.org:8080/#/c/5146/2/src/kudu/util/file_cache.h
File src/kudu/util/file_cache.h:

Line 80: class FileCache {
> document thread safety guarantees of the class where otherwise unspecified.
Done


Line 130:   std::string Dump() const;
> I think we typically call these 'ToStringDebug'.
You mean ToDebugString? Sure, I'll rename it.


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#4).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templatizing the cache to one
particular open file interface vs. the current approach. The templatized
cache looked like spaghetti due to calls between templatized methods and
generic methods, and it required (surprisingly) more casting as the
underlying LRU cache isn't templatized. Plus, it violated principle #3 in
that it made it impossible to use a single cache for multiple open file
types, and thus for different parts of Kudu.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,686 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#9).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,506 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Dinesh Bhat (Code Review)" <ge...@cloudera.org>.
Dinesh Bhat has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 5:

(11 comments)

Excellent stuff Adar, took a first pass, still to see tests and deletion path. There are some minor comments and questions below..

http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/cache.cc
File src/kudu/util/cache.cc:

PS5, Line 34: Override all cache implementations to use just one shard
Can we put a comment describing the motive for this flag ?


http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

PS5, Line 57: > 3
nit: could we compare 'size == 4' here  ?


PS5, Line 62: *reinterpret_cast<void**
I am curious if we need a double indirection first and then access the data pointer using '*' in this casting. i.e, can't we do the casting like 
"reinterpret_cast<void*>(s.mutable_data())" ?


PS5, Line 67: *reinterpret_cast<void**
Same as above.


PS5, Line 71: EvictionCallback
Is there a reason to keep this under anonymous namespace and not make it part of FileCache ? Just to be consistent with other interfaces like CodeCache.


PS5, Line 142: FileCache::
Basic question: I am curious what's the motive for using the classname as prefix like that ? Does that mean ScopedOpenedDescriptor can not be member of any other class than FileCache ?


PS5, Line 173: FileCache::ScopedOpenedDescriptor
Could we use 'using FileCache::ClassMember' at the beginning and just keep ScopedOpedDescriptor for return type  ? Here and below ?


PS5, Line 275: if (out) {
What's the use case under which this could be false apart from the very first cache entry population which is covered by L293 below ?


PS5, Line 414: OpenExistingRandomAccessFile
Nit: I am not a c++ expert, just wondering if we can leverage function overloading here, since it looks like this routine and above OpenExistingRWFile bear almost same functionality. On the other hand overloading may just give us better readability, so you can as well ignore this comment.


http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/file_cache.h
File src/kudu/util/file_cache.h:

PS5, Line 75: .
Does this need to be in cache any longer when marked to-be-deleted ?


PS5, Line 80: ihe
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/cache.cc
File src/kudu/util/cache.cc:

PS8, Line 499: Cache* 
nit: consider returning std::unique_ptr<>


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache-test-util.h
File src/kudu/util/file_cache-test-util.h:

PS8, Line 40: upper_bound_(upper_bound),
nit: may be, replace upper_bound_ with max_fd_count_ and compute it right away given initial_fd_count_ and the upper_bound parameter.


PS8, Line 45: Start()
Are any guardrails needed to check if the Start() method is not called more than once?


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

PS8, Line 52: virtual 
nit: consider dropping 'virtual' since 'override' is present.


PS8, Line 131: NO_FATALS(this->AssertFdsAndDescriptors(this->initial_open_fds_ + 1, 2));
Does it make sense to check that f2 is invalidated at this point?


PS8, Line 173: DeleteFile(kFile2)
What happens if the file is moved/renamed when it's opened by the cache?  I hope cache keeps the file open and corresponding operations succeed.

Is it worth adding a test for that?


PS8, Line 228: 0
Is the offset set to 0 intentionally?  Does it make sense to vary the offset for test purposes as well?


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

PS8, Line 24: #include "kudu/gutil/casts.h"
Is this header really needed here?


PS8, Line 164: handle_.get();
Why not just

return handle_;

?


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-stress-test.cc
File src/kudu/util/file_cache-stress-test.cc:

Line 133:     deque<shared_ptr<FileType>> files;
> It took me a bit to figure out the difference between this and the global p
Done


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test-util.h
File src/kudu/util/file_cache-test-util.h:

Line 68:       CHECK_LE(open_fd_count, max_fd_count);
> This should include an error message, since the LOG_EVERY_N may not have fi
We'll already get one for free that'll include both open_fd_count and max_fd_count. That's the entirety of the KLOG_EVERY().


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 72:   void AssertFdsAndDescriptors(int num_expected_fds,
> at every call site you are offsetting by this->initial_open_fds_, maybe the
Done


Line 101:   {
> Is this scope necessary?
Done


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 177: ScopedOpenedDescriptor<FileType> BaseDescriptor<FileType>::InsertIntoCache(
> Any reason not to put this method and the one below inline with the class?
It's because they return ScopedOpenedDescriptors. That means we need to observe the following order:
1) Forward declare ScopedOpenedDescriptor.
2) Declare BaseDescriptor, which declares these two methods.
3) Declare ScopedOpenedDescriptor. We also define it inline, but that's orthogonal to this problem.
4) Define these BaseDescriptor methods, which we can do now that ScopedOpenedDescriptor has been declared.

...I wrote all that, then I tried inlining it anyway. It worked, provided ScopedOpenedDescriptor was forward declared. I guess I don't know how compilers work. :/


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5146/4/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

PS4, Line 57: DCHECK_EQ(':', cache_key[3])
> nit: for safety, consider adding a check for 'cache_key' size, so semantica
Done


PS4, Line 127: Cache::EvictionCallback* eviction_cb() const {
> nit: does it make sense to make this method private?
It's actually only used in one place, so I'll just do the dereferencing there.


PS4, Line 206: virtual
> nit for here and elsewhere while overriding a virtual mathod: keeping 'virt
I'll drop 'virtual'.


PS4, Line 393: FileCache::~FileCache() {
             : }
> nit: consider using '~FileCache() = default;' in the header file instead
Done


PS4, Line 499: "rwf"
> nit: does it make sense to use kRWFileCacheKeyPrefix and kRandomAccessFileC
Done


PS4, Line 556: IsFileMarkedForDeletionUnlocked
> nit: why not to return boolean instead?  It might be a bit cheaper, since S
Because all three callers do the same thing. They all wrap this in RETURN_NOT_OK(), and they all expect the returned Status (in the event that file is marked for deletion) to be the same.


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: util: add file cache
......................................................................


util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Reviewed-on: http://gerrit.cloudera.org:8080/5146
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,628 insertions(+), 4 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5146/4/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

PS4, Line 57: DCHECK_EQ(':', cache_key[3])
nit: for safety, consider adding a check for 'cache_key' size, so semantically it would be something like

DCHECK(cache_key.size() > 3 && cache_key[3] == ':');


PS4, Line 127: Cache::EvictionCallback* eviction_cb() const {
nit: does it make sense to make this method private?


PS4, Line 206: virtual
nit for here and elsewhere while overriding a virtual mathod: keeping 'virtual' along with 'override' seems to be too verbose.  May be, drop 'virtual'?  Or it is required by the code style?


PS4, Line 393: FileCache::~FileCache() {
             : }
nit: consider using '~FileCache() = default;' in the header file instead


PS4, Line 499: "rwf"
nit: does it make sense to use kRWFileCacheKeyPrefix and kRandomAccessFileCacheKeyPrefix constants here instead?


PS4, Line 556: IsFileMarkedForDeletionUnlocked
nit: why not to return boolean instead?  It might be a bit cheaper, since Status is a non-trivial object.


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#2).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templatizing the cache to one
particular open file interface vs. the current approach. The templatized
cache looked like spaghetti due to calls between templatized methods and
generic methods, and it required (surprisingly) more casting as the
underlying LRU cache isn't templatized. Plus, it violated principle #3 in
that it made it impossible to use a single cache for multiple open file
types, and thus for different parts of Kudu.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shareds used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,746 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/test_util.cc
File src/kudu/util/test_util.cc:

PS5, Line 261: readlink(p.c_str(), buf.get(), size)
Wouldn't it be cheaper to use stat() call instead?  Using stat()/lstat() would allow to skip allocation of the buffer.

Another observation for MacOS X is that if using readlink command-line utility on /dev/fd/xxx items always return an error (not sure whether it calls readlink() system call).  Using the stat cmd-line utility at least returns OK for 0, 1 and 2.


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#8).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,509 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 3:

(6 comments)

Still reviewing, but I'm going to move to R3, so putting what I've got on R2.

http://gerrit.cloudera.org:8080/#/c/5146/2//COMMIT_MSG
Commit Message:

PS2, Line 41: hareds
shards?


http://gerrit.cloudera.org:8080/#/c/5146/2/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 62: RWFile* CacheValueToRWFile(Slice s) {
Can you rely on the fact that the colon is at position 3 in order to skip the find call?

    DCHECK_EQ(cache_key[3], ':');
    return cache_key.substr(4);


Line 81:                   Substitute("Could not close file $0", rwf->filename()));
remove virtual and change OVERRIDE to override.


Line 343:   friend class FileCache;
operator on preceding line


http://gerrit.cloudera.org:8080/#/c/5146/2/src/kudu/util/file_cache.h
File src/kudu/util/file_cache.h:

Line 80:   //
document thread safety guarantees of the class where otherwise unspecified.


Line 130:   template <class FileType>
I think we typically call these 'ToStringDebug'.


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/cache.cc
File src/kudu/util/cache.cc:

PS5, Line 34: Override all cache implementations to use just one shard
> Can we put a comment describing the motive for this flag ?
Done


http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

PS5, Line 57: > 3
> nit: could we compare 'size == 4' here  ?
This is moot; the code in question has been deleted.


PS5, Line 62: *reinterpret_cast<void**
> I am curious if we need a double indirection first and then access the data
That causes a crash. The reasoning is: the slice's data is a memory address, so we need to treat it as a pointer to an RWFile pointer, then dereference the first level of indirection to get to the actual RWFile pointer.

If that doesn't make sense, you're in good company; I find this very counter-intuitive myself.


PS5, Line 67: *reinterpret_cast<void**
> Same as above.
See above.


PS5, Line 71: EvictionCallback
> Is there a reason to keep this under anonymous namespace and not make it pa
If it's nested in the FileCache, it has to be declared in file_cache.h. I'd rather hide that implementation detail, hence the anonymous namespace.


PS5, Line 142: FileCache::
> Basic question: I am curious what's the motive for using the classname as p
This means that ScopedOpenedDescriptor is a nested class within FileCache. I used it primarily to deal with member visibility issues, though it's a moot point now.


PS5, Line 173: FileCache::ScopedOpenedDescriptor
> Could we use 'using FileCache::ClassMember' at the beginning and just keep 
Moot; it's no longer a nested class.


PS5, Line 275: if (out) {
> What's the use case under which this could be false apart from the very fir
Just that one, but it's legitimate. Suppose you open a descriptor, close it, then open it again. The call to ReopenFileIfNecessary() the second time around will have a cache hit.


PS5, Line 414: OpenExistingRandomAccessFile
> Nit: I am not a c++ expert, just wondering if we can leverage function over
The Google style guide frowns on using function overloading. In any case, it's a moot point now that FileCache is templatized.


http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/file_cache.h
File src/kudu/util/file_cache.h:

PS5, Line 75: .
> Does this need to be in cache any longer when marked to-be-deleted ?
Possibly. If we've marked the file as deleted, it's because there's an outstanding descriptor for it. Forcefully evicting the file from the cache could cause it to get reinserted right back by the client upon the next access.

So I don't think deferred deletion should imply anything about the eviction of a file.


PS5, Line 80: ihe
> typo
Done


http://gerrit.cloudera.org:8080/#/c/5146/5/src/kudu/util/test_util.cc
File src/kudu/util/test_util.cc:

PS5, Line 261: readlink(p.c_str(), buf.get(), size)
> Wouldn't it be cheaper to use stat() call instead?  Using stat()/lstat() wo
In retrospect, we're assured that the fd count will be inflated by 1 for kProcSelfFd, so we can just subtract it at the end.

Besides, the only safe way to use this function is to call it twice and look at the delta.


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#7).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,507 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 48: class FileCacheTest : public KuduTest {
> Consider adding a test of RunDescriptorExpiry.
That's already tested indirectly in AssertFdsAndDescriptors(). Since NumDescriptorsForTests includes expired weak pointers, we have to AssertEventually() that the thread removes them (and it does).


http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 41: DEFINE_int32(file_cache_expiry_period_ms, 1000,
> This seems pretty aggressive.  What's the downside to a longer expiry like 
I don't mind increasing it. I was mostly concerned about pathological cases like a tight loop that does nothing but open and close descriptors, but maybe that's paranoid.

Will use a minute.


Line 389: FileCache<FileType>::FileCache(const string& cache_name,
> Maybe I'm missing it, but where are you actually creating the thread?  Ther
It's in the Init() function. To be faithful to the kudu::Thread class, we should return the Status of a failed Thread::Create(), which means it can't happen in the cache constructor.


http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache.h
File src/kudu/util/file_cache.h:

Line 103:   FileCache(const std::string& cache_name,
> why this change?
The constructor wasn't actually moving cache_name, so passing by cref elides one copy.


PS10, Line 154:  descirptors
> descriptors
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 48: class FileCacheTest : public KuduTest {
Consider adding a test of RunDescriptorExpiry.


http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 41: DEFINE_int32(file_cache_expiry_period_ms, 1000,
This seems pretty aggressive.  What's the downside to a longer expiry like a minute or 10 minutes?  Are there that many dropped files piling up?

I'm worried that taking the spinlock and iterating through the map may be


Line 389: FileCache<FileType>::FileCache(const string& cache_name,
Maybe I'm missing it, but where are you actually creating the thread?  There are a bunch of checks that check if it exists, but I wouldn't have expected those, since I don't know why it should be optional.


http://gerrit.cloudera.org:8080/#/c/5146/10/src/kudu/util/file_cache.h
File src/kudu/util/file_cache.h:

Line 103:   FileCache(const std::string& cache_name,
why this change?


PS10, Line 154:  descirptors
descriptors


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#5).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templatizing the cache to one
particular open file interface vs. the current approach. The templatized
cache looked like spaghetti due to calls between templatized methods and
generic methods, and it required (surprisingly) more casting as the
underlying LRU cache isn't templatized. Plus, it violated principle #3 in
that it made it impossible to use a single cache for multiple open file
types, and thus for different parts of Kudu.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,678 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/cache.cc
File src/kudu/util/cache.cc:

PS8, Line 499: Cache* 
> nit: consider returning std::unique_ptr<>
I'm gonna punt on this; it's not in-scope for this patch.


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache-test-util.h
File src/kudu/util/file_cache-test-util.h:

PS8, Line 40: upper_bound_(upper_bound),
> nit: may be, replace upper_bound_ with max_fd_count_ and compute it right a
Done


PS8, Line 45: Start()
> Are any guardrails needed to check if the Start() method is not called more
Done


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

PS8, Line 52: virtual 
> nit: consider dropping 'virtual' since 'override' is present.
Done


PS8, Line 131: NO_FATALS(this->AssertFdsAndDescriptors(this->initial_open_fds_ + 1, 2));
> Does it make sense to check that f2 is invalidated at this point?
You mean, that the in-cache file handle belonging to f2 was evicted from the cache and closed? Yeah, I can do that.


PS8, Line 173: DeleteFile(kFile2)
> What happens if the file is moved/renamed when it's opened by the cache?  I
Moving/renaming a file is totally out of scope, and not an operation done by block managers. If you move a file opened by the cache, I'm sure bad things will happen, as the filename is used as the cache key and is static from the moment the file is opened.


PS8, Line 228: 0
> Is the offset set to 0 intentionally?  Does it make sense to vary the offse
I don't think it matters. The cache certainly doesn't care (the parameters are just passed through).


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

PS8, Line 24: #include "kudu/gutil/casts.h"
> Is this header really needed here?
I had some down_casts in here at one point, but no longer.


PS8, Line 164: handle_.get();
> Why not just
Apparently that doesn't work:

  /home/adar/Source/kudu/src/kudu/util/file_cache.cc: In instantiation of \u2018bool kudu::internal::ScopedOpenedDescriptor<FileType>::opened() const [with FileType = kudu::RWFile]\u2019:
  /home/adar/Source/kudu/src/kudu/util/file_cache.cc:272:22:   required from here
  /home/adar/Source/kudu/src/kudu/util/file_cache.cc:170:32: error: cannot convert \u2018const UniqueHandle {aka const std::unique_ptr<kudu::Cache::Handle, kudu::Cache::HandleDeleter>}\u2019 to \u2018bool\u2019 in return
     bool opened() const { return handle_; }
                                ^


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 177: ScopedOpenedDescriptor<FileType> BaseDescriptor<FileType>::InsertIntoCache(
Any reason not to put this method and the one below inline with the class?


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
......................................................................


Patch Set 10:

> Don't merge yet, I've found a race that needs to be addressed.

The race has been fixed. This new implementation revives the singleton thread that was used to remove expired weak pointers from the descriptor map.

-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] util: add file cache

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#11).

Change subject: util: add file cache
......................................................................

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,628 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>