You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Lars Francke <la...@gmail.com> on 2022/06/24 11:55:03 UTC

HDFS-16577 - Request for input - Let administrator override connection details when registering datanodes

Hi everyone,

we're trying to get HDFS running in Kubernetes using Kerberos.
This has some challenges as you might expect.
We have created an issue for that including a spike:
https://issues.apache.org/jira/browse/HDFS-16577

Currently (as of 3.2.2, but reading through the release notes this doesn't
seem to have changed since then) DataNodes use the same properties for
deciding which port to bind each service to, as for deciding which ports
are included in the `DatanodeRegistration` sent to the NameNode. Further,
NameNodes overwrite the DataNode's IP address with the incoming address
during registration.

Both of these prevent external users from connecting to DataNodes that are
hosted behind some sort of NAT (such as Kubernetes).

We'd go ahead with a proper implementation/PR but we thought about asking
for comments/feedback first. Maybe someone else has already done some work
here that we might have missed etc.

Thank you!

Cheers,
Lars

Re: HDFS-16577 - Request for input - Let administrator override connection details when registering datanodes

Posted by Lars Francke <la...@gmail.com>.
Chis,

thank you very much for your quick response.

My colleague (who raised the issue) tried to use that functionality but it
wasn't a good fit for Kubernetes because it didn't let us remap ports.
We bind to a constant port inside the pod and let Kubernetes assign us an
exposed nodeport but HDFS is hard-coded to always advertise the port of the
bound socket.

The multihoming feature approaches it from a kind of backwards angle
(("instead of binding to the advertised address, bind to this!")
Kafka (and maybe others) do it the other way around: "instead of
advertising the bound address, advertise this".

We need to be careful to not implement the same thing twice, I agree. But
the port functionality is definitely missing.

We just wanted to make sure that this is something worthwhile (we believe
so) before starting the proper implementation/proposal.

Cheers,
Lars




On Fri, Jun 24, 2022 at 6:31 PM Chris Nauroth <cn...@apache.org> wrote:

> Hello Lars,
>
> I can't say I've personally run HDFS on Kubernetes with Kerberos enabled.
> However, some of the issues you raise sound like they have some overlap
> with the HDFS multi-homing features:
>
>
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html
>
> Have you seen this? Does anything look helpful there?
>
> Chris Nauroth
>
>
> On Fri, Jun 24, 2022 at 4:55 AM Lars Francke <la...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> we're trying to get HDFS running in Kubernetes using Kerberos.
>> This has some challenges as you might expect.
>> We have created an issue for that including a spike:
>> https://issues.apache.org/jira/browse/HDFS-16577
>>
>> Currently (as of 3.2.2, but reading through the release notes this doesn't
>> seem to have changed since then) DataNodes use the same properties for
>> deciding which port to bind each service to, as for deciding which ports
>> are included in the `DatanodeRegistration` sent to the NameNode. Further,
>> NameNodes overwrite the DataNode's IP address with the incoming address
>> during registration.
>>
>> Both of these prevent external users from connecting to DataNodes that are
>> hosted behind some sort of NAT (such as Kubernetes).
>>
>> We'd go ahead with a proper implementation/PR but we thought about asking
>> for comments/feedback first. Maybe someone else has already done some work
>> here that we might have missed etc.
>>
>> Thank you!
>>
>> Cheers,
>> Lars
>>
>

Re: HDFS-16577 - Request for input - Let administrator override connection details when registering datanodes

Posted by Chris Nauroth <cn...@apache.org>.
Hello Lars,

I can't say I've personally run HDFS on Kubernetes with Kerberos enabled.
However, some of the issues you raise sound like they have some overlap
with the HDFS multi-homing features:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html

Have you seen this? Does anything look helpful there?

Chris Nauroth


On Fri, Jun 24, 2022 at 4:55 AM Lars Francke <la...@gmail.com> wrote:

> Hi everyone,
>
> we're trying to get HDFS running in Kubernetes using Kerberos.
> This has some challenges as you might expect.
> We have created an issue for that including a spike:
> https://issues.apache.org/jira/browse/HDFS-16577
>
> Currently (as of 3.2.2, but reading through the release notes this doesn't
> seem to have changed since then) DataNodes use the same properties for
> deciding which port to bind each service to, as for deciding which ports
> are included in the `DatanodeRegistration` sent to the NameNode. Further,
> NameNodes overwrite the DataNode's IP address with the incoming address
> during registration.
>
> Both of these prevent external users from connecting to DataNodes that are
> hosted behind some sort of NAT (such as Kubernetes).
>
> We'd go ahead with a proper implementation/PR but we thought about asking
> for comments/feedback first. Maybe someone else has already done some work
> here that we might have missed etc.
>
> Thank you!
>
> Cheers,
> Lars
>