You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Devin Suiter RDX <ds...@rdx.com> on 2014/01/14 19:44:17 UTC

Federated Namespaces - VM

Hi,

I just want to throw out a discussion topic on federation.

Reading *The Definitive Guide* on HDFS, it sounds like when federating,
every distinct namespace needs a distinct namenode machine instance.

This means if a company wanted three namespaces, say retail, commercial,
government, they would have to have a host machine (or machine pair for
high-availability) for each one, so 3 (pair) namenode hosts?

What if a company was hosting client data? Say they had 20 clients
accessing a cluster. 20 namespaces minimum, would mean 20 servers just for
namenodes?

At what point in this situation would it become practical to begin
virtualizing namenodes on a high-powered virtualization cluster? I think
there would be some calculation that would go into as to the expected size
of the namespace partition vs. block density vs. memory...there would also
be the obvious question of resource contention and overall system drag
caused by that...

What do other community members think?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

Re: Federated Namespaces - VM

Posted by Nitin Pawar <ni...@gmail.com>.
This is my understanding and i can be wrong:  :)

you do not really need a different hardware instance unless your each
namespace is highly busy like a single namespace hdfs cluster.

you can setup  multiple namenodes on a single machine with different config
and different namenode directories and log directories.
But then that particular machine if down meaning all your namespaces will
be down which is not a good situation in client facing cluster.

In my experience (couple of years back), any hadoop cluster on a virtual
cluster is not optimal compared to real machine. This may have changed in
last two years as virtualization has been extensively developed as well.

so at the end its more of a day to day monitoring of how your clusters are
getting utilized and then think which one can be co-hosted and which need
to be given a full hardware instance


On Wed, Jan 15, 2014 at 12:14 AM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Hi,
>
> I just want to throw out a discussion topic on federation.
>
> Reading *The Definitive Guide* on HDFS, it sounds like when federating,
> every distinct namespace needs a distinct namenode machine instance.
>
> This means if a company wanted three namespaces, say retail, commercial,
> government, they would have to have a host machine (or machine pair for
> high-availability) for each one, so 3 (pair) namenode hosts?
>
> What if a company was hosting client data? Say they had 20 clients
> accessing a cluster. 20 namespaces minimum, would mean 20 servers just for
> namenodes?
>
> At what point in this situation would it become practical to begin
> virtualizing namenodes on a high-powered virtualization cluster? I think
> there would be some calculation that would go into as to the expected size
> of the namespace partition vs. block density vs. memory...there would also
> be the obvious question of resource contention and overall system drag
> caused by that...
>
> What do other community members think?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>



-- 
Nitin Pawar

Re: Federated Namespaces - VM

Posted by Nitin Pawar <ni...@gmail.com>.
This is my understanding and i can be wrong:  :)

you do not really need a different hardware instance unless your each
namespace is highly busy like a single namespace hdfs cluster.

you can setup  multiple namenodes on a single machine with different config
and different namenode directories and log directories.
But then that particular machine if down meaning all your namespaces will
be down which is not a good situation in client facing cluster.

In my experience (couple of years back), any hadoop cluster on a virtual
cluster is not optimal compared to real machine. This may have changed in
last two years as virtualization has been extensively developed as well.

so at the end its more of a day to day monitoring of how your clusters are
getting utilized and then think which one can be co-hosted and which need
to be given a full hardware instance


On Wed, Jan 15, 2014 at 12:14 AM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Hi,
>
> I just want to throw out a discussion topic on federation.
>
> Reading *The Definitive Guide* on HDFS, it sounds like when federating,
> every distinct namespace needs a distinct namenode machine instance.
>
> This means if a company wanted three namespaces, say retail, commercial,
> government, they would have to have a host machine (or machine pair for
> high-availability) for each one, so 3 (pair) namenode hosts?
>
> What if a company was hosting client data? Say they had 20 clients
> accessing a cluster. 20 namespaces minimum, would mean 20 servers just for
> namenodes?
>
> At what point in this situation would it become practical to begin
> virtualizing namenodes on a high-powered virtualization cluster? I think
> there would be some calculation that would go into as to the expected size
> of the namespace partition vs. block density vs. memory...there would also
> be the obvious question of resource contention and overall system drag
> caused by that...
>
> What do other community members think?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>



-- 
Nitin Pawar

Re: Federated Namespaces - VM

Posted by Nitin Pawar <ni...@gmail.com>.
This is my understanding and i can be wrong:  :)

you do not really need a different hardware instance unless your each
namespace is highly busy like a single namespace hdfs cluster.

you can setup  multiple namenodes on a single machine with different config
and different namenode directories and log directories.
But then that particular machine if down meaning all your namespaces will
be down which is not a good situation in client facing cluster.

In my experience (couple of years back), any hadoop cluster on a virtual
cluster is not optimal compared to real machine. This may have changed in
last two years as virtualization has been extensively developed as well.

so at the end its more of a day to day monitoring of how your clusters are
getting utilized and then think which one can be co-hosted and which need
to be given a full hardware instance


On Wed, Jan 15, 2014 at 12:14 AM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Hi,
>
> I just want to throw out a discussion topic on federation.
>
> Reading *The Definitive Guide* on HDFS, it sounds like when federating,
> every distinct namespace needs a distinct namenode machine instance.
>
> This means if a company wanted three namespaces, say retail, commercial,
> government, they would have to have a host machine (or machine pair for
> high-availability) for each one, so 3 (pair) namenode hosts?
>
> What if a company was hosting client data? Say they had 20 clients
> accessing a cluster. 20 namespaces minimum, would mean 20 servers just for
> namenodes?
>
> At what point in this situation would it become practical to begin
> virtualizing namenodes on a high-powered virtualization cluster? I think
> there would be some calculation that would go into as to the expected size
> of the namespace partition vs. block density vs. memory...there would also
> be the obvious question of resource contention and overall system drag
> caused by that...
>
> What do other community members think?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>



-- 
Nitin Pawar

Re: Federated Namespaces - VM

Posted by Nitin Pawar <ni...@gmail.com>.
This is my understanding and i can be wrong:  :)

you do not really need a different hardware instance unless your each
namespace is highly busy like a single namespace hdfs cluster.

you can setup  multiple namenodes on a single machine with different config
and different namenode directories and log directories.
But then that particular machine if down meaning all your namespaces will
be down which is not a good situation in client facing cluster.

In my experience (couple of years back), any hadoop cluster on a virtual
cluster is not optimal compared to real machine. This may have changed in
last two years as virtualization has been extensively developed as well.

so at the end its more of a day to day monitoring of how your clusters are
getting utilized and then think which one can be co-hosted and which need
to be given a full hardware instance


On Wed, Jan 15, 2014 at 12:14 AM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Hi,
>
> I just want to throw out a discussion topic on federation.
>
> Reading *The Definitive Guide* on HDFS, it sounds like when federating,
> every distinct namespace needs a distinct namenode machine instance.
>
> This means if a company wanted three namespaces, say retail, commercial,
> government, they would have to have a host machine (or machine pair for
> high-availability) for each one, so 3 (pair) namenode hosts?
>
> What if a company was hosting client data? Say they had 20 clients
> accessing a cluster. 20 namespaces minimum, would mean 20 servers just for
> namenodes?
>
> At what point in this situation would it become practical to begin
> virtualizing namenodes on a high-powered virtualization cluster? I think
> there would be some calculation that would go into as to the expected size
> of the namespace partition vs. block density vs. memory...there would also
> be the obvious question of resource contention and overall system drag
> caused by that...
>
> What do other community members think?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>



-- 
Nitin Pawar