You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Rainer Toebbicke <rt...@pclella.cern.ch> on 2014/12/10 16:03:08 UTC

adding node(s) to Hadoop cluster

Hello,

how would you guys go about adding additional nodes to a Hadoop cluster running with Kerberos, preferably without restarting the namenode/resourcemanager/hbase-master etc?

I am aware that one can add names to dfs.hosts and run dfsadmin -refreshNodes, but with Kerberos I have the additional problem that the new hosts' principals have to be added to hadoop.security.auth_to_local (I do not have the luxury of an easy albeit secure pattern for host names). Alas, I see no way of propagating changes there to running demons.

Any ideas?

Thanks, Rainer

Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 12 déc. 2014 à 03:13, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> Auth to local mappings
> - nn/nn-host@cluster.com -> hdfs
> - dn/.*@cluster.com -> hdfs
> 
> The combination of the above lets you block any other user other than hdfs from faking like a datanode.
> 
> Purposes
> - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
> - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity
> 
> Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?
> 

It's not yet toast, the term "untrusted" is a bit harsh as the "attacker" needs, of course, a keytab entry for  dn/xxx.cluster.com@CLUSTER.COM.

However, take the example of several clusters in an organization of a size such that principal creation is somewhat automated.

With the auth_to_local pattern "dn/.*@CLUSTER.COM -> hdfs" you give administrators of cluster1 (who can read the keytab of dn/cluster1node.cluster.com@CLUSTER.COM) the possibility to act as hdfs on cluster2 through hdfs://cluster2.cluster.com/..., not something that is always appropriate.

Now, I am aware that naming nodes in a cluster-specific way solves this, as you could form a pattern. But believe it or not, here host names are derived from the purchase contract number.

I am also aware that using a cluster-specific prefix such as "cluster2-dn/" instead of the constant prefix "dn/", provided that, by policy (!), for nodes in cluster1 principals of the form "cluster2-dn/..." are never created.

Two possibilities which I don't have at hand.

But I was wondering how other people address this. At the same time hint that if there were a -refreshAuthToLocal (and while we're writing the wishlist, a -refreshTopology), I could continue to control access by auth_to_local which is comparatively painless.

And still... I may have misunderstood something and make a fuzz for nothing.

Thanks for your thoughts, 
Rainer


Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 12 déc. 2014 à 03:13, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> Auth to local mappings
> - nn/nn-host@cluster.com -> hdfs
> - dn/.*@cluster.com -> hdfs
> 
> The combination of the above lets you block any other user other than hdfs from faking like a datanode.
> 
> Purposes
> - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
> - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity
> 
> Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?
> 

It's not yet toast, the term "untrusted" is a bit harsh as the "attacker" needs, of course, a keytab entry for  dn/xxx.cluster.com@CLUSTER.COM.

However, take the example of several clusters in an organization of a size such that principal creation is somewhat automated.

With the auth_to_local pattern "dn/.*@CLUSTER.COM -> hdfs" you give administrators of cluster1 (who can read the keytab of dn/cluster1node.cluster.com@CLUSTER.COM) the possibility to act as hdfs on cluster2 through hdfs://cluster2.cluster.com/..., not something that is always appropriate.

Now, I am aware that naming nodes in a cluster-specific way solves this, as you could form a pattern. But believe it or not, here host names are derived from the purchase contract number.

I am also aware that using a cluster-specific prefix such as "cluster2-dn/" instead of the constant prefix "dn/", provided that, by policy (!), for nodes in cluster1 principals of the form "cluster2-dn/..." are never created.

Two possibilities which I don't have at hand.

But I was wondering how other people address this. At the same time hint that if there were a -refreshAuthToLocal (and while we're writing the wishlist, a -refreshTopology), I could continue to control access by auth_to_local which is comparatively painless.

And still... I may have misunderstood something and make a fuzz for nothing.

Thanks for your thoughts, 
Rainer


Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 12 déc. 2014 à 03:13, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> Auth to local mappings
> - nn/nn-host@cluster.com -> hdfs
> - dn/.*@cluster.com -> hdfs
> 
> The combination of the above lets you block any other user other than hdfs from faking like a datanode.
> 
> Purposes
> - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
> - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity
> 
> Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?
> 

It's not yet toast, the term "untrusted" is a bit harsh as the "attacker" needs, of course, a keytab entry for  dn/xxx.cluster.com@CLUSTER.COM.

However, take the example of several clusters in an organization of a size such that principal creation is somewhat automated.

With the auth_to_local pattern "dn/.*@CLUSTER.COM -> hdfs" you give administrators of cluster1 (who can read the keytab of dn/cluster1node.cluster.com@CLUSTER.COM) the possibility to act as hdfs on cluster2 through hdfs://cluster2.cluster.com/..., not something that is always appropriate.

Now, I am aware that naming nodes in a cluster-specific way solves this, as you could form a pattern. But believe it or not, here host names are derived from the purchase contract number.

I am also aware that using a cluster-specific prefix such as "cluster2-dn/" instead of the constant prefix "dn/", provided that, by policy (!), for nodes in cluster1 principals of the form "cluster2-dn/..." are never created.

Two possibilities which I don't have at hand.

But I was wondering how other people address this. At the same time hint that if there were a -refreshAuthToLocal (and while we're writing the wishlist, a -refreshTopology), I could continue to control access by auth_to_local which is comparatively painless.

And still... I may have misunderstood something and make a fuzz for nothing.

Thanks for your thoughts, 
Rainer


Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 12 déc. 2014 à 03:13, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> Auth to local mappings
> - nn/nn-host@cluster.com -> hdfs
> - dn/.*@cluster.com -> hdfs
> 
> The combination of the above lets you block any other user other than hdfs from faking like a datanode.
> 
> Purposes
> - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
> - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity
> 
> Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?
> 

It's not yet toast, the term "untrusted" is a bit harsh as the "attacker" needs, of course, a keytab entry for  dn/xxx.cluster.com@CLUSTER.COM.

However, take the example of several clusters in an organization of a size such that principal creation is somewhat automated.

With the auth_to_local pattern "dn/.*@CLUSTER.COM -> hdfs" you give administrators of cluster1 (who can read the keytab of dn/cluster1node.cluster.com@CLUSTER.COM) the possibility to act as hdfs on cluster2 through hdfs://cluster2.cluster.com/..., not something that is always appropriate.

Now, I am aware that naming nodes in a cluster-specific way solves this, as you could form a pattern. But believe it or not, here host names are derived from the purchase contract number.

I am also aware that using a cluster-specific prefix such as "cluster2-dn/" instead of the constant prefix "dn/", provided that, by policy (!), for nodes in cluster1 principals of the form "cluster2-dn/..." are never created.

Two possibilities which I don't have at hand.

But I was wondering how other people address this. At the same time hint that if there were a -refreshAuthToLocal (and while we're writing the wishlist, a -refreshTopology), I could continue to control access by auth_to_local which is comparatively painless.

And still... I may have misunderstood something and make a fuzz for nothing.

Thanks for your thoughts, 
Rainer


Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I may be mistaken, but let me try again with an example to see if we are on the same page

Principals
 - NameNode: nn/nn-host@cluster.com
 - DataNode: dn/_HOST@cluster.com

Auth to local mappings
 - nn/nn-host@cluster.com -> hdfs
 - dn/.*@cluster.com -> hdfs

The combination of the above lets you block any other user other than hdfs from faking like a datanode.

Purposes
 - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
 - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity

Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?

+Vinod


> Thanks, I may be mistaken, but I suspect you missed the point:
> 
> for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.
> 
> The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".
> 
> On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I may be mistaken, but let me try again with an example to see if we are on the same page

Principals
 - NameNode: nn/nn-host@cluster.com
 - DataNode: dn/_HOST@cluster.com

Auth to local mappings
 - nn/nn-host@cluster.com -> hdfs
 - dn/.*@cluster.com -> hdfs

The combination of the above lets you block any other user other than hdfs from faking like a datanode.

Purposes
 - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
 - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity

Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?

+Vinod


> Thanks, I may be mistaken, but I suspect you missed the point:
> 
> for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.
> 
> The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".
> 
> On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I may be mistaken, but let me try again with an example to see if we are on the same page

Principals
 - NameNode: nn/nn-host@cluster.com
 - DataNode: dn/_HOST@cluster.com

Auth to local mappings
 - nn/nn-host@cluster.com -> hdfs
 - dn/.*@cluster.com -> hdfs

The combination of the above lets you block any other user other than hdfs from faking like a datanode.

Purposes
 - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
 - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity

Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?

+Vinod


> Thanks, I may be mistaken, but I suspect you missed the point:
> 
> for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.
> 
> The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".
> 
> On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I may be mistaken, but let me try again with an example to see if we are on the same page

Principals
 - NameNode: nn/nn-host@cluster.com
 - DataNode: dn/_HOST@cluster.com

Auth to local mappings
 - nn/nn-host@cluster.com -> hdfs
 - dn/.*@cluster.com -> hdfs

The combination of the above lets you block any other user other than hdfs from faking like a datanode.

Purposes
 - _HOST: Let you deploy all datanodes with the same principal value in all their configs.
 - Auth-to-local-mapping: Map kerberos principals to unix-login names to close the loop on identity

Don't think your example of "somebody on an untrusted client can disguise as hdfs/nodename@REALM" is possible at all with Kerberos. Any references to such possibilities? If it were possible, all security is toast anyways, no?

+Vinod


> Thanks, I may be mistaken, but I suspect you missed the point:
> 
> for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.
> 
> The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".
> 
> On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 10 déc. 2014 à 20:08, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

Thanks, I may be mistaken, but I suspect you missed the point:

for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.

The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".

On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?

Thanks, Rainer

Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 10 déc. 2014 à 20:08, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

Thanks, I may be mistaken, but I suspect you missed the point:

for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.

The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".

On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?

Thanks, Rainer

Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 10 déc. 2014 à 20:08, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

Thanks, I may be mistaken, but I suspect you missed the point:

for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.

The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".

On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?

Thanks, Rainer

Re: adding node(s) to Hadoop cluster

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 10 déc. 2014 à 20:08, Vinod Kumar Vavilapalli <vi...@hortonworks.com> a écrit :

> You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

Thanks, I may be mistaken, but I suspect you missed the point:

for me, auth_to_local's role is to protect the server(s). For example,  somebody on an untrusted "client" can disguise as hdfs/nodename@REALM and hence take over hdfs through a careless principal->id translation. A well-configured auth_to_local will deflect that rogue "hdfs" to "nobody" or something, so a malicious client cannot do a "hdfs dfs -chown ..." for example.

The _HOST construct makes using the same config files throughout the cluster easier indeed, but as far as I see it mainly applies to the "client".

On the server, I see no way other than auth_to_local with a list/pattern of trusted node names (on namenode and every datanode in the hdfs case) to prevent the scenario above. Would there be?

Thanks, Rainer

Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
> I am aware that one can add names to dfs.hosts and run dfsadmin -refreshNodes, but with Kerberos I have the additional problem that the new hosts' principals have to be added to hadoop.security.auth_to_local (I do not have the luxury of an easy albeit secure pattern for host names). Alas, I see no way of propagating changes there to running demons.


This is how almost all clusters running security add nodes - add to dfs.hosts or yarn-host-file configuration and do a refresh.

You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

HTH
+Vinod
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
> I am aware that one can add names to dfs.hosts and run dfsadmin -refreshNodes, but with Kerberos I have the additional problem that the new hosts' principals have to be added to hadoop.security.auth_to_local (I do not have the luxury of an easy albeit secure pattern for host names). Alas, I see no way of propagating changes there to running demons.


This is how almost all clusters running security add nodes - add to dfs.hosts or yarn-host-file configuration and do a refresh.

You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

HTH
+Vinod
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
> I am aware that one can add names to dfs.hosts and run dfsadmin -refreshNodes, but with Kerberos I have the additional problem that the new hosts' principals have to be added to hadoop.security.auth_to_local (I do not have the luxury of an easy albeit secure pattern for host names). Alas, I see no way of propagating changes there to running demons.


This is how almost all clusters running security add nodes - add to dfs.hosts or yarn-host-file configuration and do a refresh.

You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

HTH
+Vinod
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding node(s) to Hadoop cluster

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
> I am aware that one can add names to dfs.hosts and run dfsadmin -refreshNodes, but with Kerberos I have the additional problem that the new hosts' principals have to be added to hadoop.security.auth_to_local (I do not have the luxury of an easy albeit secure pattern for host names). Alas, I see no way of propagating changes there to running demons.


This is how almost all clusters running security add nodes - add to dfs.hosts or yarn-host-file configuration and do a refresh.

You don't need patterns for host-names, did you see the support for _HOST in the principle names? You can specify the datanode principle to be say datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on each machine with the real host-name.

HTH
+Vinod
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.