You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by lohit <lo...@gmail.com> on 2012/10/30 00:14:17 UTC

Running Application (YARN) on top of federated NameNode

Hi Devs,

I am trying to understand about cluster setup with Federated NameNodes and
YARN (or MR1) on top of it specifically.
>From federation documentation (
http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html)
I can see how each namenode will have its own namespace.
Can somebody help me understand how would YARN work on this. If I have 2
NameServices, how would YARN work with both of them. YARN or clients would
look at fs.defaultFS (which point to one NameService) to resolve to DFS,
right? Is the setup something like YARN and others would connect to one
nameservices (call it top level name service) and admins would setup
symlinks from different nameservcies to this top level name service?

Thanks,
Lohit

Re: Running Application (YARN) on top of federated NameNode

Posted by lohit <lo...@gmail.com>.
Thanks Todd

2012/10/29 Todd Lipcon <to...@cloudera.com>

> Hi Lohit,
>
> There are basically three main options here:
>
> 1) Symlinks. As you suggested, you could have one of the namespaces have
> top-levels cross-filesystem symlinks to the other explicit namespaces in
> your cluster. The downside of this is that currently symlinks are not well
> supported by the FileSystem API, so you may run into serious issues using
> it with MR applications.
>
> 2) Explicitly reference individual namespaces: this is basically separate
> HDFS clusters which share a pool of datanodes. If you are using namespaces
> to separate entirely separate applications, then the different apps would
> just reference their own namenodes with no knowledge that the storage
> underneath is pooled. Of course you may run a job which has input and
> output on different namesystems, and that's completely fine.
>
> 3) Use viewfs (client side mount tables). This is essentially a client-side
> mapping of viewfs paths to the other namenodes.
>
> Hope that helps
>
> -Todd
>
> On Mon, Oct 29, 2012 at 4:14 PM, lohit <lo...@gmail.com> wrote:
>
> > Hi Devs,
> >
> > I am trying to understand about cluster setup with Federated NameNodes
> and
> > YARN (or MR1) on top of it specifically.
> > From federation documentation (
> >
> >
> http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html
> > )
> > I can see how each namenode will have its own namespace.
> > Can somebody help me understand how would YARN work on this. If I have 2
> > NameServices, how would YARN work with both of them. YARN or clients
> would
> > look at fs.defaultFS (which point to one NameService) to resolve to DFS,
> > right? Is the setup something like YARN and others would connect to one
> > nameservices (call it top level name service) and admins would setup
> > symlinks from different nameservcies to this top level name service?
> >
> > Thanks,
> > Lohit
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Have a Nice Day!
Lohit

Re: Running Application (YARN) on top of federated NameNode

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Lohit,

There are basically three main options here:

1) Symlinks. As you suggested, you could have one of the namespaces have
top-levels cross-filesystem symlinks to the other explicit namespaces in
your cluster. The downside of this is that currently symlinks are not well
supported by the FileSystem API, so you may run into serious issues using
it with MR applications.

2) Explicitly reference individual namespaces: this is basically separate
HDFS clusters which share a pool of datanodes. If you are using namespaces
to separate entirely separate applications, then the different apps would
just reference their own namenodes with no knowledge that the storage
underneath is pooled. Of course you may run a job which has input and
output on different namesystems, and that's completely fine.

3) Use viewfs (client side mount tables). This is essentially a client-side
mapping of viewfs paths to the other namenodes.

Hope that helps

-Todd

On Mon, Oct 29, 2012 at 4:14 PM, lohit <lo...@gmail.com> wrote:

> Hi Devs,
>
> I am trying to understand about cluster setup with Federated NameNodes and
> YARN (or MR1) on top of it specifically.
> From federation documentation (
>
> http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html
> )
> I can see how each namenode will have its own namespace.
> Can somebody help me understand how would YARN work on this. If I have 2
> NameServices, how would YARN work with both of them. YARN or clients would
> look at fs.defaultFS (which point to one NameService) to resolve to DFS,
> right? Is the setup something like YARN and others would connect to one
> nameservices (call it top level name service) and admins would setup
> symlinks from different nameservcies to this top level name service?
>
> Thanks,
> Lohit
>



-- 
Todd Lipcon
Software Engineer, Cloudera