You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/12/20 19:02:00 UTC

[jira] [Commented] (KUDU-3357) Allow servers to not use the advertised RPC addresses

    [ https://issues.apache.org/jira/browse/KUDU-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649937#comment-17649937 ] 

ASF subversion and git services commented on KUDU-3357:
-------------------------------------------------------

Commit 3f29b5da5f59ea96cfec0608226d5c35740884a6 in kudu's branch refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3f29b5da5 ]

KUDU-3357 endpoints for proxied RPCs

This patch introduces a solution to the problem outlined in KUDU-3357.

The idea is to establish separate RPC endpoint(s) for Kudu servers to
handle traffic proxied from external network(s).  So, when a Kudu server
receives an RPC request, it has enough information to decide whether
to handle the request as arriving from the internal or some external
network.  All the communications of Kudu components in the cluster
should be routed through the standard RPC endpoints, but the requests
proxied from external networks should be routed through those dedicated
RPC endpoints.  When a Kudu server receives an RPC through such an
endpoint, it can substitute internal RPC addresses of Kudu servers with
corresponding RPC addresses reachable to the client through a TCP proxy.

With that, the following new flags have been introduced, both accepting
comma-separated list of strings of form '<hostname>:<port>':

--rpc_proxy_advertised_addresses

  That's to set the server's RPC endpoints exposed to the external
  network via a TCP proxy.

--rpc_proxied_addresses

  That's to define RPC endpoints in the inner network to handle
  RPC requests forwarded/proxied from outside networks.  It's possible
  to use a wildcard for IP address (i.e. 0.0.0.0)
  and the port number (i.e. 0) for the elements of this address list.

The newly introduced --rpc_proxy_advertised_addresses is orthogonal
to already existing --rpc_advertised_addresses, so it's possible to use
both simultaneously if the network environment for Docker containers
in the private internal network is configured in a funny way.

This approach allows for separating the internal and the external
traffic, meanwhile providing the connectivity for Kudu clients running
in external networks, where the internal traffic is never routed through
a proxy's or a loadbalancer's endpoint.  The approach with having only
--rpc_advertised_addresses for public cloud deployments (referred
by KUDU-3357) routes _all_ the Kudu traffic through the endpoints
exposed by the proxy/loadbalancer, and that's the problem this
patch addresses.

I verified this patch works as expected in k8s environment running in
AWS/EC2 cloud where Kudu cluster was deployed in a containerized manner
using Kudu Docker images.  In particular, RPC calls from a client
running in the external network (I was running it from my laptop behind
a firewall) were forwarded/proxied via a TCP proxy (NGINX) to Kudu
servers running in a AWS cluster deployed behind a load balancer.
I used the "kudu perf loadgen" CLI tool to create tables and write
data, and "kudu perf table_scan" to read data.  A test Kudu Java client
application worked as well.

NOTE: even if "kudu cluster ksck" tool also worked, it's not yet a goal
      to be able to run "kudu cluster ksck" and other administrative
      CLI tools from the outside; those tasks are expected to be
      performed from within Kudu cluster's internal network

Follow-up patches should also add:
  * proper advertising of a proxy/loadbalancer endpoint to be forwarded
    to the embedded web server's endpoint for master and tablet servers
  * support for multi-master configurations when forwarding RPCs
    from external networks

Change-Id: Ic300250556d3f6e522a71923bed6aa5cd45375ea
Reviewed-on: http://gerrit.cloudera.org:8080/19231
Tested-by: Kudu Jenkins
Reviewed-by: Attila Bukor <ab...@apache.org>


> Allow servers to not use the advertised RPC addresses
> -----------------------------------------------------
>
>                 Key: KUDU-3357
>                 URL: https://issues.apache.org/jira/browse/KUDU-3357
>             Project: Kudu
>          Issue Type: Improvement
>          Components: rpc
>            Reporter: Andrew Wong
>            Assignee: Alexey Serbin
>            Priority: Major
>
> When Kudu servers are deployed within an internal network with internal hostnames (e.g. in a k8s cluster), and Kudu clients are deployed outside of this network with a mapping of external traffic to internal ports (e.g. with a load balancer), it’s unclear how to route the Kudu client to the servers without having all traffic (including RPCs between servers) use publicly accessible addresses.
> For instance, all servers could be configured with the --rpc_advertised_addreses configuration. However, since these addresses are used to register servers with the Master, not only would they be used to indicate where clients should look for data, but they would also be used to indicate where replicas should heartbeat to other replicas. This would induce a great deal of traffic on the load balancer.
> We should consider allowing “internal” (i.e. tserver and master) traffic to bypass advertised addresses and use an alternate address. Or at the very least, introduce a policy for selecting which advertised address to use depending on what is available (currently, we always the first in the list).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)