You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by lohit <lo...@gmail.com> on 2013/05/15 17:55:31 UTC

Re: Question about Name Spaces…

Namespace is mainly for Namenode scalability. If someone copies file to
another namespace, then essentially they would be creating 6 copies of same
file.
To achieve file name redundancy, it is better to have NameNode HA, instead
of copying it to another namespace. Since Datanodes serve blocks to
multiple namespace, locality is not an issue and copying file to another
namespace would not buy you much.


2013/5/15 Michael Segel <mi...@hotmail.com>

> Well...
>
> On the one hand, I'm trying to understand why one would break a cluster in
> to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
>
> On the other. Why would someone want to have a copy of a file in two
> different name spaces?
>
> I'm making an assumption that when we have 3x replication that the
> replicas don't cross name space boundaries. (Is this correct?)
>
> My take is that one would copy a file to a second name space because they
> want a physical copy in both name spaces for redundancy in case a name
> space goes down. They would do this only for mission critical files, or if
> the data is being shared by two different groups who want their own copy of
> the data and they work solely within a single name space.
>
> The reason I am asking is that I'm trying to see how people view and use
> namespaces.
>
> Does that make sense?
>
> Thx
>
>
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
>
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name
> nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are
> you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
>
>


-- 
Have a Nice Day!
Lohit

Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit