You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Michael Segel <mi...@hotmail.com> on 2013/05/15 16:17:26 UTC

Question about Name Spaces…

Quick question...
So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 



Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Harsh, 

IMHO, I don't see the need of doing any sort of links over name spaces or containers. 
If you want to have data that spans containers, you copy the data to the container, not link to it. 

That's why I have a hard time understanding why someone would want to hard link over a namespace. 

I'm trying to understand an argument made against HDFS-3370.

Thx
-Mike

On May 16, 2013, at 12:14 AM, Harsh J <ha...@cloudera.com> wrote:

> Do you see viewfs mounts coming useful there (i.e. in place of
> hardlinks across NSes)?
> 
> On Thu, May 16, 2013 at 3:49 AM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
>> Especially in terms of hive... ;-)
>> 
>> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>> 
>> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
>> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>> 
>> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>> 
>>> Namespace divides are designed with application-level separation in
>>> mind. Sharing a file across namespaces does not make a whole lot of
>>> sense to me.
>>> 
>>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>>> own availability (if thats really a concern), so I don't see why
>>> anyone would like to _maintain_ two synced copies of files as thats
>>> just data duplication when all you need is a simple path (viewfs)/URI
>>> (hdfs) to access a file lying on a different NN.
>>> 
>>> The reason you mention of metadata availability doesn't sound logical
>>> - in such a case a person has to build a self failover of URIs for
>>> said file, which they can simply avoid by using HDFS HA for the
>>> hosting NN.
>>> 
>>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>>> <mi...@hotmail.com> wrote:
>>>> Quick question...
>>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Harsh, 

IMHO, I don't see the need of doing any sort of links over name spaces or containers. 
If you want to have data that spans containers, you copy the data to the container, not link to it. 

That's why I have a hard time understanding why someone would want to hard link over a namespace. 

I'm trying to understand an argument made against HDFS-3370.

Thx
-Mike

On May 16, 2013, at 12:14 AM, Harsh J <ha...@cloudera.com> wrote:

> Do you see viewfs mounts coming useful there (i.e. in place of
> hardlinks across NSes)?
> 
> On Thu, May 16, 2013 at 3:49 AM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
>> Especially in terms of hive... ;-)
>> 
>> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>> 
>> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
>> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>> 
>> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>> 
>>> Namespace divides are designed with application-level separation in
>>> mind. Sharing a file across namespaces does not make a whole lot of
>>> sense to me.
>>> 
>>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>>> own availability (if thats really a concern), so I don't see why
>>> anyone would like to _maintain_ two synced copies of files as thats
>>> just data duplication when all you need is a simple path (viewfs)/URI
>>> (hdfs) to access a file lying on a different NN.
>>> 
>>> The reason you mention of metadata availability doesn't sound logical
>>> - in such a case a person has to build a self failover of URIs for
>>> said file, which they can simply avoid by using HDFS HA for the
>>> hosting NN.
>>> 
>>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>>> <mi...@hotmail.com> wrote:
>>>> Quick question...
>>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Harsh, 

IMHO, I don't see the need of doing any sort of links over name spaces or containers. 
If you want to have data that spans containers, you copy the data to the container, not link to it. 

That's why I have a hard time understanding why someone would want to hard link over a namespace. 

I'm trying to understand an argument made against HDFS-3370.

Thx
-Mike

On May 16, 2013, at 12:14 AM, Harsh J <ha...@cloudera.com> wrote:

> Do you see viewfs mounts coming useful there (i.e. in place of
> hardlinks across NSes)?
> 
> On Thu, May 16, 2013 at 3:49 AM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
>> Especially in terms of hive... ;-)
>> 
>> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>> 
>> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
>> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>> 
>> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>> 
>>> Namespace divides are designed with application-level separation in
>>> mind. Sharing a file across namespaces does not make a whole lot of
>>> sense to me.
>>> 
>>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>>> own availability (if thats really a concern), so I don't see why
>>> anyone would like to _maintain_ two synced copies of files as thats
>>> just data duplication when all you need is a simple path (viewfs)/URI
>>> (hdfs) to access a file lying on a different NN.
>>> 
>>> The reason you mention of metadata availability doesn't sound logical
>>> - in such a case a person has to build a self failover of URIs for
>>> said file, which they can simply avoid by using HDFS HA for the
>>> hosting NN.
>>> 
>>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>>> <mi...@hotmail.com> wrote:
>>>> Quick question...
>>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Harsh, 

IMHO, I don't see the need of doing any sort of links over name spaces or containers. 
If you want to have data that spans containers, you copy the data to the container, not link to it. 

That's why I have a hard time understanding why someone would want to hard link over a namespace. 

I'm trying to understand an argument made against HDFS-3370.

Thx
-Mike

On May 16, 2013, at 12:14 AM, Harsh J <ha...@cloudera.com> wrote:

> Do you see viewfs mounts coming useful there (i.e. in place of
> hardlinks across NSes)?
> 
> On Thu, May 16, 2013 at 3:49 AM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
>> Especially in terms of hive... ;-)
>> 
>> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>> 
>> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
>> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>> 
>> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>> 
>>> Namespace divides are designed with application-level separation in
>>> mind. Sharing a file across namespaces does not make a whole lot of
>>> sense to me.
>>> 
>>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>>> own availability (if thats really a concern), so I don't see why
>>> anyone would like to _maintain_ two synced copies of files as thats
>>> just data duplication when all you need is a simple path (viewfs)/URI
>>> (hdfs) to access a file lying on a different NN.
>>> 
>>> The reason you mention of metadata availability doesn't sound logical
>>> - in such a case a person has to build a self failover of URIs for
>>> said file, which they can simply avoid by using HDFS HA for the
>>> hosting NN.
>>> 
>>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>>> <mi...@hotmail.com> wrote:
>>>> Quick question...
>>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Do you see viewfs mounts coming useful there (i.e. in place of
hardlinks across NSes)?

On Thu, May 16, 2013 at 3:49 AM, Michael Segel
<mi...@hotmail.com> wrote:
> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
> Especially in terms of hive... ;-)
>
> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>
> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>
> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Namespace divides are designed with application-level separation in
>> mind. Sharing a file across namespaces does not make a whole lot of
>> sense to me.
>>
>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>> own availability (if thats really a concern), so I don't see why
>> anyone would like to _maintain_ two synced copies of files as thats
>> just data duplication when all you need is a simple path (viewfs)/URI
>> (hdfs) to access a file lying on a different NN.
>>
>> The reason you mention of metadata availability doesn't sound logical
>> - in such a case a person has to build a self failover of URIs for
>> said file, which they can simply avoid by using HDFS HA for the
>> hosting NN.
>>
>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>>> Quick question...
>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Do you see viewfs mounts coming useful there (i.e. in place of
hardlinks across NSes)?

On Thu, May 16, 2013 at 3:49 AM, Michael Segel
<mi...@hotmail.com> wrote:
> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
> Especially in terms of hive... ;-)
>
> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>
> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>
> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Namespace divides are designed with application-level separation in
>> mind. Sharing a file across namespaces does not make a whole lot of
>> sense to me.
>>
>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>> own availability (if thats really a concern), so I don't see why
>> anyone would like to _maintain_ two synced copies of files as thats
>> just data duplication when all you need is a simple path (viewfs)/URI
>> (hdfs) to access a file lying on a different NN.
>>
>> The reason you mention of metadata availability doesn't sound logical
>> - in such a case a person has to build a self failover of URIs for
>> said file, which they can simply avoid by using HDFS HA for the
>> hosting NN.
>>
>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>>> Quick question...
>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Do you see viewfs mounts coming useful there (i.e. in place of
hardlinks across NSes)?

On Thu, May 16, 2013 at 3:49 AM, Michael Segel
<mi...@hotmail.com> wrote:
> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
> Especially in terms of hive... ;-)
>
> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>
> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>
> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Namespace divides are designed with application-level separation in
>> mind. Sharing a file across namespaces does not make a whole lot of
>> sense to me.
>>
>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>> own availability (if thats really a concern), so I don't see why
>> anyone would like to _maintain_ two synced copies of files as thats
>> just data duplication when all you need is a simple path (viewfs)/URI
>> (hdfs) to access a file lying on a different NN.
>>
>> The reason you mention of metadata availability doesn't sound logical
>> - in such a case a person has to build a self failover of URIs for
>> said file, which they can simply avoid by using HDFS HA for the
>> hosting NN.
>>
>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>>> Quick question...
>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Do you see viewfs mounts coming useful there (i.e. in place of
hardlinks across NSes)?

On Thu, May 16, 2013 at 3:49 AM, Michael Segel
<mi...@hotmail.com> wrote:
> Actually creating links, symbolic or hard links makes sense in a couple of scenarios.
> Especially in terms of hive... ;-)
>
> So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten?
>
> The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
> IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.
>
> On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Namespace divides are designed with application-level separation in
>> mind. Sharing a file across namespaces does not make a whole lot of
>> sense to me.
>>
>> Anyhow, the data is on the same set of DNs, and there's HA for NN's
>> own availability (if thats really a concern), so I don't see why
>> anyone would like to _maintain_ two synced copies of files as thats
>> just data duplication when all you need is a simple path (viewfs)/URI
>> (hdfs) to access a file lying on a different NN.
>>
>> The reason you mention of metadata availability doesn't sound logical
>> - in such a case a person has to build a self failover of URIs for
>> said file, which they can simply avoid by using HDFS HA for the
>> hosting NN.
>>
>> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>>> Quick question...
>>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Actually creating links, symbolic or hard links makes sense in a couple of scenarios. 
Especially in terms of hive... ;-) 

So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten? 

The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.

On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:

> Namespace divides are designed with application-level separation in
> mind. Sharing a file across namespaces does not make a whole lot of
> sense to me.
> 
> Anyhow, the data is on the same set of DNs, and there's HA for NN's
> own availability (if thats really a concern), so I don't see why
> anyone would like to _maintain_ two synced copies of files as thats
> just data duplication when all you need is a simple path (viewfs)/URI
> (hdfs) to access a file lying on a different NN.
> 
> The reason you mention of metadata availability doesn't sound logical
> - in such a case a person has to build a self failover of URIs for
> said file, which they can simply avoid by using HDFS HA for the
> hosting NN.
> 
> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Actually creating links, symbolic or hard links makes sense in a couple of scenarios. 
Especially in terms of hive... ;-) 

So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten? 

The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.

On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:

> Namespace divides are designed with application-level separation in
> mind. Sharing a file across namespaces does not make a whole lot of
> sense to me.
> 
> Anyhow, the data is on the same set of DNs, and there's HA for NN's
> own availability (if thats really a concern), so I don't see why
> anyone would like to _maintain_ two synced copies of files as thats
> just data duplication when all you need is a simple path (viewfs)/URI
> (hdfs) to access a file lying on a different NN.
> 
> The reason you mention of metadata availability doesn't sound logical
> - in such a case a person has to build a self failover of URIs for
> said file, which they can simply avoid by using HDFS HA for the
> hosting NN.
> 
> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Actually creating links, symbolic or hard links makes sense in a couple of scenarios. 
Especially in terms of hive... ;-) 

So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten? 

The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.

On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:

> Namespace divides are designed with application-level separation in
> mind. Sharing a file across namespaces does not make a whole lot of
> sense to me.
> 
> Anyhow, the data is on the same set of DNs, and there's HA for NN's
> own availability (if thats really a concern), so I don't see why
> anyone would like to _maintain_ two synced copies of files as thats
> just data duplication when all you need is a simple path (viewfs)/URI
> (hdfs) to access a file lying on a different NN.
> 
> The reason you mention of metadata availability doesn't sound logical
> - in such a case a person has to build a self failover of URIs for
> said file, which they can simply avoid by using HDFS HA for the
> hosting NN.
> 
> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Actually creating links, symbolic or hard links makes sense in a couple of scenarios. 
Especially in terms of hive... ;-) 

So it kind of goes back to my extension of the question about that Jira (HDFS-3370) to see if its alive or just forgotten? 

The point is that one of the arguments against doing it didn't make sense. Creating hard links across Name Spaces.
IMHO you'd want to create hard links within the same NN. Maybe a symbolic link across name spaces, but even then, I'm not so sure... still need to think more about the problem.

On May 15, 2013, at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:

> Namespace divides are designed with application-level separation in
> mind. Sharing a file across namespaces does not make a whole lot of
> sense to me.
> 
> Anyhow, the data is on the same set of DNs, and there's HA for NN's
> own availability (if thats really a concern), so I don't see why
> anyone would like to _maintain_ two synced copies of files as thats
> just data duplication when all you need is a simple path (viewfs)/URI
> (hdfs) to access a file lying on a different NN.
> 
> The reason you mention of metadata availability doesn't sound logical
> - in such a case a person has to build a self failover of URIs for
> said file, which they can simply avoid by using HDFS HA for the
> hosting NN.
> 
> On Wed, May 15, 2013 at 7:47 PM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> 


Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Namespace divides are designed with application-level separation in
mind. Sharing a file across namespaces does not make a whole lot of
sense to me.

Anyhow, the data is on the same set of DNs, and there's HA for NN's
own availability (if thats really a concern), so I don't see why
anyone would like to _maintain_ two synced copies of files as thats
just data duplication when all you need is a simple path (viewfs)/URI
(hdfs) to access a file lying on a different NN.

The reason you mention of metadata availability doesn't sound logical
- in such a case a person has to build a self failover of URIs for
said file, which they can simply avoid by using HDFS HA for the
hosting NN.

On Wed, May 15, 2013 at 7:47 PM, Michael Segel
<mi...@hotmail.com> wrote:
> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>
>



-- 
Harsh J

Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Namespace divides are designed with application-level separation in
mind. Sharing a file across namespaces does not make a whole lot of
sense to me.

Anyhow, the data is on the same set of DNs, and there's HA for NN's
own availability (if thats really a concern), so I don't see why
anyone would like to _maintain_ two synced copies of files as thats
just data duplication when all you need is a simple path (viewfs)/URI
(hdfs) to access a file lying on a different NN.

The reason you mention of metadata availability doesn't sound logical
- in such a case a person has to build a self failover of URIs for
said file, which they can simply avoid by using HDFS HA for the
hosting NN.

On Wed, May 15, 2013 at 7:47 PM, Michael Segel
<mi...@hotmail.com> wrote:
> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>
>



-- 
Harsh J

Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lo...@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace, then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not an issue and copying file to another namespace would not buy you much.  
> 
> 
> 2013/5/15 Michael Segel <mi...@hotmail.com>
> Well...
> 
> On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
> 
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> 
> I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)
> 
> My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space.
> 
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> 
> Does that make sense?
> 
> Thx
> 
> 
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
> 
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Re: Question about Name Spaces…

Posted by lohit <lo...@gmail.com>.
Namespace is mainly for Namenode scalability. If someone copies file to
another namespace, then essentially they would be creating 6 copies of same
file.
To achieve file name redundancy, it is better to have NameNode HA, instead
of copying it to another namespace. Since Datanodes serve blocks to
multiple namespace, locality is not an issue and copying file to another
namespace would not buy you much.


2013/5/15 Michael Segel <mi...@hotmail.com>

> Well...
>
> On the one hand, I'm trying to understand why one would break a cluster in
> to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
>
> On the other. Why would someone want to have a copy of a file in two
> different name spaces?
>
> I'm making an assumption that when we have 3x replication that the
> replicas don't cross name space boundaries. (Is this correct?)
>
> My take is that one would copy a file to a second name space because they
> want a physical copy in both name spaces for redundancy in case a name
> space goes down. They would do this only for mission critical files, or if
> the data is being shared by two different groups who want their own copy of
> the data and they work solely within a single name space.
>
> The reason I am asking is that I'm trying to see how people view and use
> namespaces.
>
> Does that make sense?
>
> Thx
>
>
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
>
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name
> nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are
> you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
>
>


-- 
Have a Nice Day!
Lohit

Re: Question about Name Spaces…

Posted by lohit <lo...@gmail.com>.
Namespace is mainly for Namenode scalability. If someone copies file to
another namespace, then essentially they would be creating 6 copies of same
file.
To achieve file name redundancy, it is better to have NameNode HA, instead
of copying it to another namespace. Since Datanodes serve blocks to
multiple namespace, locality is not an issue and copying file to another
namespace would not buy you much.


2013/5/15 Michael Segel <mi...@hotmail.com>

> Well...
>
> On the one hand, I'm trying to understand why one would break a cluster in
> to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
>
> On the other. Why would someone want to have a copy of a file in two
> different name spaces?
>
> I'm making an assumption that when we have 3x replication that the
> replicas don't cross name space boundaries. (Is this correct?)
>
> My take is that one would copy a file to a second name space because they
> want a physical copy in both name spaces for redundancy in case a name
> space goes down. They would do this only for mission critical files, or if
> the data is being shared by two different groups who want their own copy of
> the data and they work solely within a single name space.
>
> The reason I am asking is that I'm trying to see how people view and use
> namespaces.
>
> Does that make sense?
>
> Thx
>
>
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
>
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name
> nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are
> you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
>
>


-- 
Have a Nice Day!
Lohit

Re: Question about Name Spaces…

Posted by lohit <lo...@gmail.com>.
Namespace is mainly for Namenode scalability. If someone copies file to
another namespace, then essentially they would be creating 6 copies of same
file.
To achieve file name redundancy, it is better to have NameNode HA, instead
of copying it to another namespace. Since Datanodes serve blocks to
multiple namespace, locality is not an issue and copying file to another
namespace would not buy you much.


2013/5/15 Michael Segel <mi...@hotmail.com>

> Well...
>
> On the one hand, I'm trying to understand why one would break a cluster in
> to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
>
> On the other. Why would someone want to have a copy of a file in two
> different name spaces?
>
> I'm making an assumption that when we have 3x replication that the
> replicas don't cross name space boundaries. (Is this correct?)
>
> My take is that one would copy a file to a second name space because they
> want a physical copy in both name spaces for redundancy in case a name
> space goes down. They would do this only for mission critical files, or if
> the data is being shared by two different groups who want their own copy of
> the data and they work solely within a single name space.
>
> The reason I am asking is that I'm trying to see how people view and use
> namespaces.
>
> Does that make sense?
>
> Thx
>
>
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
>
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name
> nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are
> you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
>
>


-- 
Have a Nice Day!
Lohit

Re: Question about Name Spaces…

Posted by lohit <lo...@gmail.com>.
Namespace is mainly for Namenode scalability. If someone copies file to
another namespace, then essentially they would be creating 6 copies of same
file.
To achieve file name redundancy, it is better to have NameNode HA, instead
of copying it to another namespace. Since Datanodes serve blocks to
multiple namespace, locality is not an issue and copying file to another
namespace would not buy you much.


2013/5/15 Michael Segel <mi...@hotmail.com>

> Well...
>
> On the one hand, I'm trying to understand why one would break a cluster in
> to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
>
> On the other. Why would someone want to have a copy of a file in two
> different name spaces?
>
> I'm making an assumption that when we have 3x replication that the
> replicas don't cross name space boundaries. (Is this correct?)
>
> My take is that one would copy a file to a second name space because they
> want a physical copy in both name spaces for redundancy in case a name
> space goes down. They would do this only for mission critical files, or if
> the data is being shared by two different groups who want their own copy of
> the data and they work solely within a single name space.
>
> The reason I am asking is that I'm trying to see how people view and use
> namespaces.
>
> Does that make sense?
>
> Thx
>
>
> On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:
>
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name
> nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are
> you saying is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
>
>


-- 
Have a Nice Day!
Lohit

Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Well... 

On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces. 
(Obviously this gets back to managing very large clusters.) 

On the other. Why would someone want to have a copy of a file in two different name spaces? 

I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)

My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space. 

The reason I am asking is that I'm trying to see how people view and use namespaces. 

Does that make sense? 

Thx


On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:

> 
> 
> On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> 
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
>> 
> Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 
> 
> Could you rephrase or give more information 
>> 
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Well... 

On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces. 
(Obviously this gets back to managing very large clusters.) 

On the other. Why would someone want to have a copy of a file in two different name spaces? 

I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)

My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space. 

The reason I am asking is that I'm trying to see how people view and use namespaces. 

Does that make sense? 

Thx


On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:

> 
> 
> On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> 
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
>> 
> Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 
> 
> Could you rephrase or give more information 
>> 
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Well... 

On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces. 
(Obviously this gets back to managing very large clusters.) 

On the other. Why would someone want to have a copy of a file in two different name spaces? 

I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)

My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space. 

The reason I am asking is that I'm trying to see how people view and use namespaces. 

Does that make sense? 

Thx


On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:

> 
> 
> On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> 
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
>> 
> Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 
> 
> Could you rephrase or give more information 
>> 
> 


Re: Question about Name Spaces…

Posted by Michael Segel <mi...@hotmail.com>.
Well... 

On the one hand, I'm trying to understand why one would break a cluster in to multiple name spaces. 
(Obviously this gets back to managing very large clusters.) 

On the other. Why would someone want to have a copy of a file in two different name spaces? 

I'm making an assumption that when we have 3x replication that the replicas don't cross name space boundaries. (Is this correct?)

My take is that one would copy a file to a second name space because they want a physical copy in both name spaces for redundancy in case a name space goes down. They would do this only for mission critical files, or if the data is being shared by two different groups who want their own copy of the data and they work solely within a single name space. 

The reason I am asking is that I'm trying to see how people view and use namespaces. 

Does that make sense? 

Thx


On May 15, 2013, at 9:24 AM, Lohit <lo...@yahoo.com> wrote:

> 
> 
> On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:
> 
>> Quick question...
>> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
>> 
> Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 
> 
> Could you rephrase or give more information 
>> 
> 


Re: Question about Name Spaces…

Posted by Lohit <lo...@yahoo.com>.

On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:

> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
> 
Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 

Could you rephrase or give more information 
> 

Re: Question about Name Spaces…

Posted by Lohit <lo...@yahoo.com>.

On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:

> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
> 
Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 

Could you rephrase or give more information 
> 

Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Namespace divides are designed with application-level separation in
mind. Sharing a file across namespaces does not make a whole lot of
sense to me.

Anyhow, the data is on the same set of DNs, and there's HA for NN's
own availability (if thats really a concern), so I don't see why
anyone would like to _maintain_ two synced copies of files as thats
just data duplication when all you need is a simple path (viewfs)/URI
(hdfs) to access a file lying on a different NN.

The reason you mention of metadata availability doesn't sound logical
- in such a case a person has to build a self failover of URIs for
said file, which they can simply avoid by using HDFS HA for the
hosting NN.

On Wed, May 15, 2013 at 7:47 PM, Michael Segel
<mi...@hotmail.com> wrote:
> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>
>



-- 
Harsh J

Re: Question about Name Spaces…

Posted by Harsh J <ha...@cloudera.com>.
Namespace divides are designed with application-level separation in
mind. Sharing a file across namespaces does not make a whole lot of
sense to me.

Anyhow, the data is on the same set of DNs, and there's HA for NN's
own availability (if thats really a concern), so I don't see why
anyone would like to _maintain_ two synced copies of files as thats
just data duplication when all you need is a simple path (viewfs)/URI
(hdfs) to access a file lying on a different NN.

The reason you mention of metadata availability doesn't sound logical
- in such a case a person has to build a self failover of URIs for
said file, which they can simply avoid by using HDFS HA for the
hosting NN.

On Wed, May 15, 2013 at 7:47 PM, Michael Segel
<mi...@hotmail.com> wrote:
> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces?
>
>



-- 
Harsh J

Re: Question about Name Spaces…

Posted by Lohit <lo...@yahoo.com>.

On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:

> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
> 
Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 

Could you rephrase or give more information 
> 

Re: Question about Name Spaces…

Posted by Lohit <lo...@yahoo.com>.

On May 15, 2013, at 7:17 AM, Michael Segel <mi...@hotmail.com> wrote:

> Quick question...
> So when we have a cluster which has multiple namespaces (multiple name nodes) , why would you have a file in two different namespaces? 
> 
Are you saying why one would create same file in two namespace? Or are you saying is there an option to have only one file but in two namespace? 

Could you rephrase or give more information 
>