You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/05/07 17:12:33 UTC

[kudu-CR] [resolver] TTL cache for DNS resolver

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13266


Change subject: [resolver] TTL cache for DNS resolver
......................................................................

[resolver] TTL cache for DNS resolver

Added TTL cache for the results of DNS resolution.  By default,
the cache is enabled and its capacity is 1 MiByte, while cache's
record TTL is 15 seconds.  Updated corresponding tests as well.

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
---
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
3 files changed, 142 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/13266/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 2:

(1 comment)

> client_symbols-test is failing:
 > 
 > Found bad symbol 'vmem_aligned_alloc'
 > Found bad symbol 'vmem_calloc'
 > Found bad symbol 'vmem_check'
 > Found bad symbol 'vmem_check_version'
 > Found bad symbol 'vmem_create'
 > Found bad symbol 'vmem_create_in_region'
 > Found bad symbol 'vmem_delete'
 > Found bad symbol 'vmem_errormsg'
 > Found bad symbol 'vmem_free'
 > Found bad symbol 'vmem_malloc'
 > Found bad symbol 'vmem_malloc_usable_size'
 > Found bad symbol 'vmem_realloc'
 > Found bad symbol 'vmem_set_funcs'
 > Found bad symbol 'vmem_stats_print'
 > Found bad symbol 'vmem_strdup'
 > Kudu client library contains 15 bad symbols
 > 
 > You have two choices:
 > 1. Do some additional refactoring to convince the linker that even
 > though you're creating a TTLCache which, under the hood, could
 > create an NVM cache, you'll never actually do that, and it
 > shouldn't bring the vmem symbols into the client library.
 > 2. Punt and declare these symbols as local in client/symbols.map.
 > 
 > I'd prefer #1 so as to reduce the size of the client library, but
 > if it's not a huge size difference and if it proves to be
 > challenging, maybe #2 is OK.

Thank you for taking a look at the failures.  I also found that after a couple of attempts to restart Jenkins build job.  However, it required me to run the tests locally using ctest, since dist-test didn't report any issues: http://dist-test.cloudera.org//job?job_id=aserbin.1557248586.130621

Right, it seems it's necessary to do something with nvmcache-related stuff.  Probably, I'll address it later: I didn't expect it to be a non-trivial change like that.

http://gerrit.cloudera.org:8080/#/c/13266/2/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/2/src/kudu/util/net/dns_resolver.h@28
PS2, Line 28: #include "kudu/util/ttl_cache.h"
> Could you forward declare TTLCache instead? Or does that not work because o
I relied on IWYU suggestion here.  But it's possible to have it forward-declared, as I understand, yes.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 May 2019 03:48:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13266

to look at the new patch set (#5).

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................

KUDU-2791: TTL cache in DNS resolver (part 1)

Added TTL cache for the results of name resolution in DnsResolver.
Updated corresponding tests as well.  A follow-up changelist will
add necessary plumbing in various places where it's beneficial to
use caching DnsResolver instead of HostPort::ResolveAddresses().

This changelist also introduces runtime flags which map into parameters
of the TTL cache used in DnsResolver. By default, the cache's capacity
is 1 MiByte, and its records' TTL is 15 seconds.

1 MiByte capacity is big enough to accommodate thousands of records.
From the other side, it's low enough to avoid running an extra thread
for scrubbing the cache of expired records, allowing the cache to purge
expired records only when it is at capacity.

Record's TTL of 15 seconds is half of the minimum recommended as stated
by various 'best practices': 30 seconds for DNS A records
(see [1] -- [4] below).  Kudu masters and tablet servers are not
supposed to run behind a load balancer, so 15 seconds TTL for a cached
DNS A record seems to be good enough in the context of running a Kudu
cluster.

[1] https://ns1.com/knowledgebase/ttl-best-practices
[2] https://serverfault.com/questions/7478/recommended-dns-ttl
[3] https://support.google.com/a/answer/48090?hl=en
[4] https://ns1.com/blog/what-is-the-lowest-ttl-i-can-get-away-with

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
---
M src/kudu/client/client.cc
M src/kudu/client/meta_cache.cc
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
M src/kudu/util/net/net_util.cc
6 files changed, 226 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/13266/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13266

to look at the new patch set (#3).

Change subject: WIP [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................

WIP [dns_resolver] KUDU-2791 TTL cache in DNS resolver

Added TTL cache for the results of DNS resolution.  Updated
corresponding tests as well.

By default, the cache is enabled, its capacity is 1 MiByte, and
its records' TTL is 15 seconds.

1 MiByte capacity is big enough to accommodate thousands of records.
From the other side, it's low enough to avoid running an extra thread
for scrubbing the cache of expired records, allowing the cache to purge
expired records only when it is at capacity.

Record's TTL of 15 seconds is half of the minimum recommended as stated
by various 'best practices': 30 seconds for DNS A records
(see [1] -- [4] below).  Kudu masters and tablet servers are not
supposed to run behind a load balancer, so 15 seconds TTL for a cached
DNS A record seems to be good enough in the context of running a Kudu
cluster.

[1] https://ns1.com/knowledgebase/ttl-best-practices
[2] https://serverfault.com/questions/7478/recommended-dns-ttl
[3] https://support.google.com/a/answer/48090?hl=en
[4] https://ns1.com/blog/what-is-the-lowest-ttl-i-can-get-away-with

WIP: I need to figure out how to avoid linking libmemkind into
     libkudu_client

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
---
M src/kudu/client/client-internal.cc
M src/kudu/client/client-internal.h
M src/kudu/client/client.cc
M src/kudu/client/meta_cache.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/kserver/kserver.cc
M src/kudu/kserver/kserver.h
M src/kudu/master/master_service.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/ts_descriptor-test.cc
M src/kudu/master/ts_descriptor.cc
M src/kudu/master/ts_descriptor.h
M src/kudu/master/ts_manager.cc
M src/kudu/master/ts_manager.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
M src/kudu/util/net/net_util.cc
27 files changed, 378 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/13266/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver-test.cc
File src/kudu/util/net/dns_resolver-test.cc:

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver-test.cc@88
PS4, Line 88:   LOG(INFO) << Substitute("$0 non-cached resolutions of '$1' took $2",
> Are 1000 iterations of non-cached DNS lookup fast enough for this test?
The test tries to resolve localhost, so it's fast enough.  It usually takes under half a second (at least for my macOS build).


http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.h@82
PS4, Line 82:   typedef TTLCache<std::string, std::vector<Sockaddr>> HostRecordCache;
> Should doc what the keys and values are.
Done


http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.cc
File src/kudu/util/net/dns_resolver.cc:

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.cc@105
PS4, Line 105: resolved_addresses
> Should use cached_addresses here for consistency.
Ah, indeed.  Good catch!



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 30 May 2019 07:10:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: WIP [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 3:

> (1 comment)
 > 
 > > (1 comment)
 > >
 > > Could you split up the patch into at least two parts?
 > > 1. Making DnsResolver the class you need it to be.
 > > 2. Plumbing it into new areas of the codebase?
 > 
 > Yep.  Actually, I started like that but I was not sure that
 > wrapping DnsResolver::ResolveAddresses(const HostPort&,
 > std::vector<SockAddr>*) into a non-static method would be welcome. 
 > But if it's explicitly requested -- sure, I'll do.

Ah, yes -- ResolveAddresses() would not be a static method once adding the TTL cache.  I'm not sure why my train of thought went that way, but it seems I started re-factoring with just splitting into sync and async interfaces.


-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 28 May 2019 23:54:13 +0000
Gerrit-HasComments: No

[kudu-CR] WIP KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: WIP KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h@73
PS3, Line 73: class AsyncDnsResolver: public DnsResolver {
> I looked at the implementation and I see what you're saying. But doesn't th
SGTM.  I merged sync and async interfaces into a single class.  However, I don't use the pool&synchronizer approach for synchronous resolution since it's not necessary as of now.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 30 May 2019 01:29:33 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13266

to look at the new patch set (#7).

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................

KUDU-2791: TTL cache in DNS resolver (part 1)

Added TTL cache for the results of name resolution in DnsResolver.
Updated corresponding tests as well.  A follow-up changelist will
add necessary plumbing in various places where it's beneficial to
use caching DnsResolver instead of HostPort::ResolveAddresses().

This changelist also introduces runtime flags which map into parameters
of the TTL cache used in DnsResolver. By default, the cache's capacity
is 1 MiByte, and its records' TTL is 15 seconds.

1 MiByte capacity is big enough to accommodate thousands of records.
From the other side, it's low enough to avoid running an extra thread
for scrubbing the cache of expired records, allowing the cache to purge
expired records only when it is at capacity.

Record's TTL of 15 seconds is half of the minimum recommended as stated
by various 'best practices': 30 seconds for DNS A records
(see [1] -- [4] below).  Kudu masters and tablet servers are not
supposed to run behind a load balancer, so 15 seconds TTL for a cached
DNS A record seems to be good enough in the context of running a Kudu
cluster.

[1] https://ns1.com/knowledgebase/ttl-best-practices
[2] https://serverfault.com/questions/7478/recommended-dns-ttl
[3] https://support.google.com/a/answer/48090?hl=en
[4] https://ns1.com/blog/what-is-the-lowest-ttl-i-can-get-away-with

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
---
M src/kudu/client/client-internal.cc
M src/kudu/client/client-internal.h
M src/kudu/client/client.cc
M src/kudu/client/meta_cache.cc
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
M src/kudu/util/net/net_util.cc
8 files changed, 233 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/13266/7
-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG@10
PS1, Line 10: the cache is enabled and its capacity is 1 MiByte, while cache's
> Wow, I appreciate the thorough description. Thank you.
Thank you for the feedback!  It seems there is some issue with  the tests run by Jenkins, but I didn't see any failure while running it using dist-test.  I'll take a closer look.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 07 May 2019 20:17:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13266/6/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/6/src/kudu/util/net/dns_resolver.h@39
PS6, Line 39: An
> Nit: A
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 31 May 2019 01:53:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 2:

> (2 comments)
 > 
 > In and of itself this doesn't address KUDU-2791 because DnsResolver
 > is only used in certain parts of the C++ client. You'll need to do
 > some more refactoring (i.e. chase down all calls to
 > HostPort::ResolveAddresses) for the cache to be used universally.

That's a good observation.  For some reason, I naively assumed that all DNS resolution is done by DnsResolver, so I could easily get away with adding TTL cache there.  But apparently, that's not the case.

Probably, that means the DNS cache should be a singleton accessed by  HostPort::ResolveAddresses(), or the code could be refactored to use DnsResolver elsewhere, but then it needs to have 'synchronous' mode or alike.


-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 May 2019 03:59:33 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................

KUDU-2791: TTL cache in DNS resolver (part 1)

Added TTL cache for the results of name resolution in DnsResolver.
Updated corresponding tests as well.  A follow-up changelist will
add necessary plumbing in various places where it's beneficial to
use caching DnsResolver instead of HostPort::ResolveAddresses().

This changelist also introduces runtime flags which map into parameters
of the TTL cache used in DnsResolver. By default, the cache's capacity
is 1 MiByte, and its records' TTL is 15 seconds.

1 MiByte capacity is big enough to accommodate thousands of records.
From the other side, it's low enough to avoid running an extra thread
for scrubbing the cache of expired records, allowing the cache to purge
expired records only when it is at capacity.

Record's TTL of 15 seconds is half of the minimum recommended as stated
by various 'best practices': 30 seconds for DNS A records
(see [1] -- [4] below).  Kudu masters and tablet servers are not
supposed to run behind a load balancer, so 15 seconds TTL for a cached
DNS A record seems to be good enough in the context of running a Kudu
cluster.

[1] https://ns1.com/knowledgebase/ttl-best-practices
[2] https://serverfault.com/questions/7478/recommended-dns-ttl
[3] https://support.google.com/a/answer/48090?hl=en
[4] https://ns1.com/blog/what-is-the-lowest-ttl-i-can-get-away-with

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Reviewed-on: http://gerrit.cloudera.org:8080/13266
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/client/client-internal.cc
M src/kudu/client/client-internal.h
M src/kudu/client/client.cc
M src/kudu/client/meta_cache.cc
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
M src/kudu/util/net/net_util.cc
8 files changed, 233 insertions(+), 60 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: WIP KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................


Patch Set 4:

(3 comments)

All of the builds failed with:

18:31:52 [ 69%] Linking CXX shared library ../../../lib/exported/libkudu_client.so
18:31:52 /usr/bin/ld: ../../../../../thirdparty/installed/uninstrumented/lib/libmemkind.a(memkind.o): relocation R_X86_64_32 against `.data' can not be used when making a shared object; recompile with -fPIC
18:31:52 ../../../../../thirdparty/installed/uninstrumented/lib/libmemkind.a: error adding symbols: Bad value

This suggests that we aren't building memkind with -fPIC. We should either do that, or make it impossible for the memkind symbols to show up in the exported client library. Unfortunately this change introduces that link by adding references to Cache inside DnsResolver.

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver-test.cc
File src/kudu/util/net/dns_resolver-test.cc:

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver-test.cc@88
PS4, Line 88:   LOG(INFO) << Substitute("$0 non-cached resolutions of '$1' took $2",
Are 1000 iterations of non-cached DNS lookup fast enough for this test?


http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.h@82
PS4, Line 82:   typedef TTLCache<std::string, std::vector<Sockaddr>> HostRecordCache;
Should doc what the keys and values are.


http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.cc
File src/kudu/util/net/dns_resolver.cc:

http://gerrit.cloudera.org:8080/#/c/13266/4/src/kudu/util/net/dns_resolver.cc@105
PS4, Line 105: resolved_addresses
Should use cached_addresses here for consistency.

But I think the more accurate way to account for the charge here is:

  kudu_malloc_usable_size(cached_addresses) + cached_addresses.capacity() > 0 ? kudu_malloc_usable_size(cached_addresses.data()) : 0;



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 30 May 2019 03:00:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 2:

> Probably, that means the DNS cache should be a singleton accessed by  HostPort::ResolveAddresses(), or the code could be refactored to use DnsResolver elsewhere, but then it needs to have 'synchronous' mode or alike.

Would prefer not to introduce a new singleton (though of course that depends on the plumbing challenges involved).

Synchronous mode shouldn't be bad; just a new function call that calls the async function with a Synchronizer.


-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 May 2019 04:21:50 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13266/6/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/6/src/kudu/util/net/dns_resolver.h@39
PS6, Line 39: An
Nit: A



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 30 May 2019 23:54:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13266

to look at the new patch set (#2).

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................

[dns_resolver] KUDU-2791 TTL cache in DNS resolver

Added TTL cache for the results of DNS resolution.  Updated
corresponding tests as well.

By default, the cache is enabled, its capacity is 1 MiByte, and
its records' TTL is 15 seconds.

1 MiByte capacity is big enough to accommodate thousands of records.
From the other side, it's low enough to avoid running an extra thread
for scrubbing the cache of expired records, allowing the cache to purge
expired records only when it is at capacity.

Record's TTL of 15 seconds is half of the minimum recommended as stated
by various 'best practices': 30 seconds for DNS A records
(see [1] -- [4] below).  Kudu masters and tablet servers are not
supposed to run behind a load balancer, so 15 seconds TTL for a cached
DNS A record seems to be good enough in the context of running a Kudu
cluster.

[1] https://ns1.com/knowledgebase/ttl-best-practices
[2] https://serverfault.com/questions/7478/recommended-dns-ttl
[3] https://support.google.com/a/answer/48090?hl=en
[4] https://ns1.com/blog/what-is-the-lowest-ttl-i-can-get-away-with

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
---
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
3 files changed, 142 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/13266/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 31 May 2019 04:24:43 +0000
Gerrit-HasComments: No

[kudu-CR] [resolver] TTL cache for DNS resolver

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [resolver] TTL cache for DNS resolver
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG@7
PS1, Line 7: [resolver] TTL cache for DNS resolver
Add KUDU-2791 to the commit message.


http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG@10
PS1, Line 10: the cache is enabled and its capacity is 1 MiByte, while cache's
Is there some logic you used to choose the default capacity and ttl? Did you test various values?



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 07 May 2019 17:28:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 2:

client_symbols-test is failing:

  Found bad symbol 'vmem_aligned_alloc'
  Found bad symbol 'vmem_calloc'
  Found bad symbol 'vmem_check'
  Found bad symbol 'vmem_check_version'
  Found bad symbol 'vmem_create'
  Found bad symbol 'vmem_create_in_region'
  Found bad symbol 'vmem_delete'
  Found bad symbol 'vmem_errormsg'
  Found bad symbol 'vmem_free'
  Found bad symbol 'vmem_malloc'
  Found bad symbol 'vmem_malloc_usable_size'
  Found bad symbol 'vmem_realloc'
  Found bad symbol 'vmem_set_funcs'
  Found bad symbol 'vmem_stats_print'
  Found bad symbol 'vmem_strdup'
  Kudu client library contains 15 bad symbols

You have two choices:
1. Do some additional refactoring to convince the linker that even though you're creating a TTLCache which, under the hood, could create an NVM cache, you'll never actually do that, and it shouldn't bring the vmem symbols into the client library.
2. Punt and declare these symbols as local in client/symbols.map.

I'd prefer #1 so as to reduce the size of the client library, but if it's not a huge size difference and if it proves to be challenging, maybe #2 is OK.


-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 07 May 2019 23:26:14 +0000
Gerrit-HasComments: No

[kudu-CR] WIP [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: WIP [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h@73
PS3, Line 73: class AsyncDnsResolver: public DnsResolver {
> I thought it makes sense to have a synchronous-only version.  The idea was 
I looked at the implementation and I see what you're saying. But doesn't that threadpool thread only spawn if needed? That is, if the synchronous-only version _does not_ use the threadpool, there isn't any thread overhead? Nor is there any overhead when the pool goes idle and the thread is killed?

Thinking about it some more, I think this would be sufficient:

  // Uses a threadpool with up to N threads to do resolution.
  void ResolveAddressesAsync(const HostPort& hostport,
                             std::vector<Sockaddr>* addresses,
                             const StatusCallback& cb);

  void ResolveAddresses(const HostPort& hostport,
                        std::vector<Sockaddr>* addresses) {
    Synchronizer s;
    ResolveAddressesAsync(hostport, addresses, s.StatusCallback());
    return s.Wait();
  }

When idle, the pool's threads are killed. When active, they're running. Maybe we can improve on this, but I don't see why that should mandate a DnsResolver subclass; ultimately the interface is the same. We can get clever under the hood and not go through the pool for synchronous calls, but again that has no bearing on the interface.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 29 May 2019 00:15:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: WIP [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 3:

(1 comment)

Could you split up the patch into at least two parts?
1. Making DnsResolver the class you need it to be.
2. Plumbing it into new areas of the codebase?

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h@73
PS3, Line 73: class AsyncDnsResolver: public DnsResolver {
Why model this as a subclass? Why can't we just add async methods to DnsResolver?



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 28 May 2019 21:38:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13266

to look at the new patch set (#6).

Change subject: KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................

KUDU-2791: TTL cache in DNS resolver (part 1)

Added TTL cache for the results of name resolution in DnsResolver.
Updated corresponding tests as well.  A follow-up changelist will
add necessary plumbing in various places where it's beneficial to
use caching DnsResolver instead of HostPort::ResolveAddresses().

This changelist also introduces runtime flags which map into parameters
of the TTL cache used in DnsResolver. By default, the cache's capacity
is 1 MiByte, and its records' TTL is 15 seconds.

1 MiByte capacity is big enough to accommodate thousands of records.
From the other side, it's low enough to avoid running an extra thread
for scrubbing the cache of expired records, allowing the cache to purge
expired records only when it is at capacity.

Record's TTL of 15 seconds is half of the minimum recommended as stated
by various 'best practices': 30 seconds for DNS A records
(see [1] -- [4] below).  Kudu masters and tablet servers are not
supposed to run behind a load balancer, so 15 seconds TTL for a cached
DNS A record seems to be good enough in the context of running a Kudu
cluster.

[1] https://ns1.com/knowledgebase/ttl-best-practices
[2] https://serverfault.com/questions/7478/recommended-dns-ttl
[3] https://support.google.com/a/answer/48090?hl=en
[4] https://ns1.com/blog/what-is-the-lowest-ttl-i-can-get-away-with

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
---
M src/kudu/client/client-internal.cc
M src/kudu/client/client-internal.h
M src/kudu/client/client.cc
M src/kudu/client/meta_cache.cc
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
M src/kudu/util/net/net_util.cc
8 files changed, 233 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/13266/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG@10
PS1, Line 10: the cache is enabled and its capacity is 1 MiByte, while cache's
> I added a few sentences trying to explain the choice.
Wow, I appreciate the thorough description. Thank you.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 07 May 2019 20:03:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG@7
PS1, Line 7: [resolver] TTL cache for DNS resolver
> Add KUDU-2791 to the commit message.
Done


http://gerrit.cloudera.org:8080/#/c/13266/1//COMMIT_MSG@10
PS1, Line 10: the cache is enabled and its capacity is 1 MiByte, while cache's
> Is there some logic you used to choose the default capacity and ttl? Did yo
I added a few sentences trying to explain the choice.

The functionality of the TTL cache w.r.t. entries expiration given various TTL values is covered by TTLCache's tests.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 07 May 2019 19:10:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 2:

(2 comments)

In and of itself this doesn't address KUDU-2791 because DnsResolver is only used in certain parts of the C++ client. You'll need to do some more refactoring (i.e. chase down all calls to HostPort::ResolveAddresses) for the cache to be used universally.

http://gerrit.cloudera.org:8080/#/c/13266/2/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/2/src/kudu/util/net/dns_resolver.h@28
PS2, Line 28: #include "kudu/util/ttl_cache.h"
Could you forward declare TTLCache instead? Or does that not work because of unique_ptr's rules around deleters?


http://gerrit.cloudera.org:8080/#/c/13266/2/src/kudu/util/net/dns_resolver.cc
File src/kudu/util/net/dns_resolver.cc:

http://gerrit.cloudera.org:8080/#/c/13266/2/src/kudu/util/net/dns_resolver.cc@43
PS2, Line 43: 
Note that these aren't configurable via gflag in applications that embed the Kudu C++ client library. If you want them to be configurable, you'll need to add explicit C++ APIs for them. But we probably don't need to do that on day one.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 07 May 2019 23:42:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-2791: TTL cache in DNS resolver (part 1)

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13266

to look at the new patch set (#4).

Change subject: WIP KUDU-2791: TTL cache in DNS resolver (part 1)
......................................................................

WIP KUDU-2791: TTL cache in DNS resolver (part 1)

Added TTL cache for the results of name resolution in DnsResolver.
Updated corresponding tests as well.  A follow-up changelist will
add necessary plumbing in various places where it's beneficial to
use caching DnsResolver instead of HostPort::ResolveAddresses().

This changelist also introduces runtime flags which map into parameters
of the TTL cache used in DnsResolver. By default, the cache's capacity
is 1 MiByte, and its records' TTL is 15 seconds.

1 MiByte capacity is big enough to accommodate thousands of records.
From the other side, it's low enough to avoid running an extra thread
for scrubbing the cache of expired records, allowing the cache to purge
expired records only when it is at capacity.

Record's TTL of 15 seconds is half of the minimum recommended as stated
by various 'best practices': 30 seconds for DNS A records
(see [1] -- [4] below).  Kudu masters and tablet servers are not
supposed to run behind a load balancer, so 15 seconds TTL for a cached
DNS A record seems to be good enough in the context of running a Kudu
cluster.

[1] https://ns1.com/knowledgebase/ttl-best-practices
[2] https://serverfault.com/questions/7478/recommended-dns-ttl
[3] https://support.google.com/a/answer/48090?hl=en
[4] https://ns1.com/blog/what-is-the-lowest-ttl-i-can-get-away-with

WIP: I need to figure out how to avoid linking libmemkind into
     libkudu_client.  Probably, I'll move cache-related
     instantiations with NVM backing into a separate library.

Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
---
M src/kudu/client/client.cc
M src/kudu/client/meta_cache.cc
M src/kudu/util/net/dns_resolver-test.cc
M src/kudu/util/net/dns_resolver.cc
M src/kudu/util/net/dns_resolver.h
M src/kudu/util/net/net_util.cc
6 files changed, 219 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/13266/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP [dns resolver] KUDU-2791 TTL cache in DNS resolver

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13266 )

Change subject: WIP [dns_resolver] KUDU-2791 TTL cache in DNS resolver
......................................................................


Patch Set 3:

(1 comment)

> (1 comment)
 > 
 > Could you split up the patch into at least two parts?
 > 1. Making DnsResolver the class you need it to be.
 > 2. Plumbing it into new areas of the codebase?

Yep.  Actually, I started like that but I was not sure that wrapping DnsResolver::ResolveAddresses(const HostPort&, std::vector<SockAddr>*) into a non-static method would be welcome.  But if it's explicitly requested -- sure, I'll do.

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h
File src/kudu/util/net/dns_resolver.h:

http://gerrit.cloudera.org:8080/#/c/13266/3/src/kudu/util/net/dns_resolver.h@73
PS3, Line 73: class AsyncDnsResolver: public DnsResolver {
> Why model this as a subclass? Why can't we just add async methods to DnsRes
I thought it makes sense to have a synchronous-only version.  The idea was to use the synchronous-only version in all the places where we use HostPort::ResolveAddresses() -- that would allow to be type-safe and avoid spawning an extra thread where it's not needed.

I hope it sounds reasonable, but I'm not against the idea of using the same pattern everywhere if we don't care much about an extra thread.



-- 
To view, visit http://gerrit.cloudera.org:8080/13266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1bbd55a8231fd541d2087f9202f24e80bc79f0b
Gerrit-Change-Number: 13266
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 28 May 2019 23:48:43 +0000
Gerrit-HasComments: Yes