You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2020/11/16 18:15:00 UTC

[jira] [Created] (HBASE-25292) Update InetSocketAddress usage discipline

Andrew Kyle Purtell created HBASE-25292:
-------------------------------------------

             Summary: Update InetSocketAddress usage discipline
                 Key: HBASE-25292
                 URL: https://issues.apache.org/jira/browse/HBASE-25292
             Project: HBase
          Issue Type: Bug
          Components: Client, HFile
            Reporter: Andrew Kyle Purtell


We sometimes cache InetSocketAddress in data structures in an attempt to optimize away potential nameservice (DNS) lookups. This is, in general, an anti-pattern, because once an InetSocketAddress is resolved, resolution is never attempted again. The ideal pattern for connect() is ISA instantiation just before the connect() call, with no reuse of the ISA instance. For bind() we presume the local identity won't change while the process is live so usage and caching can be relaxed in that case.

If I can restate my proposal for a usage convention for InetSocketAddress, it would be this: Network identities should be bound late. This means addresses should be resolved at the last possible moment. Also, network identity mappings can change, so our code should not inappropriately cache them; otherwise we might miss a change and fail to operate normally.

I have reviewed the code for InetSocketAddress usage and in my opinion sometimes we are caching ISA acceptably, and in other cases we are not.

Correct cases:
 * We cache ISA for RPC connections, so we don't potentially do a lookup for every Call. However, we resolve the address earlier than we need to. The code can be improved by moving resolution to just before where we connect().

Incorrect cases that can be fixed:
 * RPC stubs. Remote clients may be recycled and replaced with new instances where the network identities (DNS name to IP address mapping) have changed--. HBASE-14544 attempts to work around DNS instability in data centers of years past in a way that, in my opinion, is the wrong thing to do in the modern era. This is just a technical opinion and not critical to the rest of the proposal. That said, I intend to propose a revert of HBASE-14544. Reverting this simplifies some code a bit. (If this part of the proposal is controversial it can be dropped.) When looking up the IP address of the remote host when creating a stub key we also make a key even if the resolution fails. This is the wrong thing to do. If we can't resolve the remote address, we can't contact the server. Making a stub that can't communicate is pointless. Throw an exception instead.
 * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't have to make up a list right away and cache them forever. We can use Address to record the list of favored nodes and convert from Address to InetSocketAddress on demand (when we go to create the HFile). This will allow us to resolve datanode hostnames just before they are needed. In public cloud, kubernetes, and or some private datacenter service deployment options, datanode servers may have their network identities (DNS name -> IP address mapping) changed over time. We can and should avoid inappropriate caching that may cause us to indefinitely use an incorrect address when contacting a favored node. 
 * Sometimes we use ISA when Address is just as good. For example, the dead servers list. If we are going to pay some attention to ISA usage discipline, let's remove the cases where we use ISA as a host and port pair but do not need to do so. Address works just as well and doesn't present an opportunity for misuse. Another example would be the RPC client concurrentCounterCache.

Incorrect cases that cannot be fixed:
 * hbase-external-blockcache: We have to resolve all of the memcached locations up front because the memcached client constructor requires ISA instances. So we have to hope that the network identities (DNS name -> IP address mapping) does not change for any in the list. This is beyond our control.

While in this area it is trivial to add new client connect metrics for number of potential nameservice lookups (whenever we instantiate an ISA) and number of failed nameservice lookups (if the instantiated ISA is unresolved).

While in this area I also noticed we often directly access a field in ConnectionId where there is also a getter, so good practice is to use the getter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)