You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Suresh Srinivas <su...@yahoo-inc.com> on 2011/03/03 23:41:25 UTC

Merging Namenode Federation feature (HDFS-1052) to trunk

We have started pushing changes for namenode federation in to the feature branch HDFS-1052. The work items are created as subtask of the jira HDFS-1052 and are based on the design document published in the same jira. By the end of this week, we will complete pushing the changes to HDFS-1052 branch. Though the changes in these jiras are already committed, please do provide your feedback on either HDFS-1052 or its subtasks. New items that come out of the feedback will be addressed in new jiras.

Current status of the development:
# The testing of this feature is underway. Most of the basic functionality has been tested both for a single namenode cluster (for backward compatibility) and with multiple namenodes.
# All the existing tests and newly added tests pass (same as trunk).

We plan on merging this branch to trunk after a week or two. This will help us continue make future changes on the trunk. I will send an announcement before merging the federation branch into trunk.

Regards,
Suresh

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by suresh srinivas <sr...@gmail.com>.

> Namenode code is not changed at all.
Want to make sure I qualify this right. The change is not significant, other
than notion of BPID that the NN uses is added.

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by suresh srinivas <sr...@gmail.com>.

That is a different motivation. The document talks about why you should use
> federation. I am asking about motivation of supporting the code base while
> not using it. At least this is how understand Allen's question and some of
> my colleagues'.
>

Namenode code is not changed at all. Datanode code changes to add the notion
of block pool and a thread per NN. For a single NN, datanode is equivalent
to the current datanode. If you argue that there should not be any code
change - not sure how features like this can be added to HDFS. There is no
change from user perspective and performance of the system. No additional
complexity from the existing system.


> If you could put some numbers in the jira for the reference.
>
Will do.


>
> Also it is interesting to know whether there is a benefit in splitting
> the namespace. Can I e.g. do more getBlockLocations per second?
> This is one of the aspects of scaling, right?
>

I do not understand your question. This feature does not scale
getBlockLocations per second for a single NN. When you use many NNs, total
requests per second does scale for the entire cluster.

> As we developed this feature, some significant improvements have been made
> to the system - fast snapshots (snapshot time down from 1hr 45 mins to 1
> min!), fast startup, cleanup of storage, fixing multi threading issues in
> several places, decommissioning improvements etc.
>

> This is a valid concern. Hence the single namenode configuration that most
> > installations run today, will run as is. We put a lot of development and
> > testing effort to ensure this.
> >
>
> I don't know what you mean by "as is". My experience with this word in real
> estate tells me it can be anything.
>

I used the word with following meaning:
http://www.merriam-webster.com/dictionary/as%20is
— *as is*
*:* in the presently existing condition without modification

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Konstantin Shvachko <sh...@gmail.com>.

On Mon, Mar 14, 2011 at 11:19 PM, suresh srinivas <sr...@gmail.com>wrote:

> Thanks for starting off the discussion.
>
> > This is a huge new feature with 86 jiras already filed, which
> substantially increases the complexity of the code base.
> These are 86 jiras file in a feature branch. We decided to make these
> changes, in smaller increments, instead of a jumbo patch. This was done in
> good faith, as community did not want a jumbo patch (as seen in several
> discussions), to make reviewing of the patch easy and to record the changes
> for reference.
>

Thanks for doing it that way.


> > Having an in-depth motivation and benchmarking will be needed before the
> community decides on adopting it for support.
> This comes as a surprise, especially from Konstantin :-). The first part of
> the proposal and design both cover motivation.
>

That is a different motivation. The document talks about why you should use
federation. I am asking about motivation of supporting the code base while
not using it. At least this is how understand Allen's question and some of
my
colleagues'.

So far our tests show no difference with federation.
>

This is exactly what is needed.
If you could put some numbers in the jira for the reference.

Also it is interesting to know whether there is a benefit in splitting
the namespace. Can I e.g. do more getBlockLocations per second?
This is one of the aspects of scaling, right?


> As we developed this feature, some significant improvements have been made
> to the system - fast snapshots (snapshot time down from 1hr 45 mins to 1
> min!), fast startup, cleanup of storage, fixing multi threading issues in
> several places, decommissioning improvements etc.
>

This is motivation. I am glad I asked.


> > The purpose of my reply was to get this discussion going, as I found
> Allens question unanswered for 2 weeks.
> My email was sent on March 3rd. Allen's email was sent on March 12th.
>

Sorry, my bad.


> > The concern he has seems legitimate to me. If ops think federation will
> "make running a grid much much harder" I want to know why and how much
> harder.
> I would like to understand the concerns here. Allen please add details.
>
> > The way I see it now, Federation introduces
> > - lots of code complexity to the system
> > - harder manageability, according to Allen
> > - potential performance degradation (tbd)
> I have addressed these already.
>
> > And the main question for those 95% of users, who don't run large
> clusters
> or don't want to place all their compute resources in one data center, is
> what is the advantage in supporting it?
> This is a valid concern. Hence the single namenode configuration that most
> installations run today, will run as is. We put a lot of development and
> testing effort to ensure this.
>

I don't know what you mean by "as is". My experience with this word in real
estate tells me it can be anything.

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by suresh srinivas <sr...@gmail.com>.

Thanks for starting off the discussion.

> This is a huge new feature with 86 jiras already filed, which
substantially increases the complexity of the code base.
These are 86 jiras file in a feature branch. We decided to make these
changes, in smaller increments, instead of a jumbo patch. This was done in
good faith, as community did not want a jumbo patch (as seen in several
discussions), to make reviewing of the patch easy and to record the changes
for reference. Main changes have gone in a few jiras. Others are mainly
fixing the test failures, adding tests and fixing bugs introduced during
development. Please review the patch and provide feed back; we will address
the concerns.

> Having an in-depth motivation and benchmarking will be needed before the
community decides on adopting it for support.
This comes as a surprise, especially from Konstantin :-). The first part of
the proposal and design both cover motivation.

As regards to benchmarking - if you see the design, there is no big change
in i/o subsystem. Most of the changes are in the organization of storage to
introduce block pools, block pool ID, a thread per namenode in datanode,
upgrade/rollback. Not sure what concerns you have as regards to
benchmarking. So far our tests show no difference with federation.

As we developed this feature, some significant improvements have been made
to the system - fast snapshots (snapshot time down from 1hr 45 mins to 1
min!), fast startup, cleanup of storage, fixing multi threading issues in
several places, decommissioning improvements etc.

> The purpose of my reply was to get this discussion going, as I found
Allens question unanswered for 2 weeks.
My email was sent on March 3rd. Allen's email was sent on March 12th.

> The concern he has seems legitimate to me. If ops think federation will
"make running a grid much much harder" I want to know why and how much
harder.
I would like to understand the concerns here. Allen please add details.

> The way I see it now, Federation introduces
> - lots of code complexity to the system
> - harder manageability, according to Allen
> - potential performance degradation (tbd)
I have addressed these already.

> And the main question for those 95% of users, who don't run large clusters
or don't want to place all their compute resources in one data center, is
what is the advantage in supporting it?
This is a valid concern. Hence the single namenode configuration that most
installations run today, will run as is. We put a lot of development and
testing effort to ensure this.

Regards,
Suresh

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Travis Crawford <tr...@gmail.com>.

On Mon, Mar 14, 2011 at 6:12 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Dhruba, good you are speaking up for federation.
> I consider it important as it means more support for the feature in the
> future.
>
> The purpose of my reply was to get this discussion going, as I found Allens
> question unanswered for 2 weeks.
> The concern he has seems legitimate to me. If ops think federation will
> "make running a grid much much harder" I want to know why and how much
> harder.
> Because cluster "manageability" is claimed as one of the objectives of
> federation.
>
> I sure am well familiar with the design being a part of it for a while.
> And all my concerns have been articulated and well known. Though not all of
> them are addressed.
>
> The way I see it now, Federation introduces
> - lots of code complexity to the system
> - harder manageability, according to Allen
> - potential performance degradation (tbd)
> And the main question for those 95% of users, who don't run large clusters
> or
> don't want to place all their compute resources in one data center, is what
> is the advantage in supporting it?
>
> Performance-wise there 2 main aspects:
> - Does federation give me the same cluster performance if I don't federate?
> - If I federate how much more throughput can I get?
>


This reminds me of multi-cell GFS (discussed by Quinlan & McKusick at
http://bit.ly/einKMn). I used to run some of those clusters, and
compared to standard single-master clusters of course they were more
complex to manage. However, if you have apps needing that much master
capacity & that much shared read+write bandwidth across large pools of
storage nodes, its worth the trouble.

Assuming most people don't use federation it shouldn't add complexity
in the common case, but opens up some needed capabilities for large
sites. Stuff like datanode management would become more challenging in
a multi-master environment, but that's where automation comes in. If
you don't have teams building tools to manage your datacenter, its
likely you don't need federation either.

I'm currently running a handful of HDFS clusters & my overall reaction
to federation is "that's cool, but I probably won't need it for a few
years." Seems like the sort of thing the vast majority of sites won't
even encounter - you'd just add datanodes to one master & start using
it.

--travis


> Thanks,
> --Konstantin
>
> On Mon, Mar 14, 2011 at 10:43 AM, Dhruba Borthakur <dh...@gmail.com> wrote:
>
>> Hi folks,
>>
>> The design for the federation work has been a published and there is a very
>> well-written design document. It explains the pros-and-cons of each design
>> point. It would be nice if more people can review this document and provide
>> comments on how to make it better. The implementation is in progress but
>> that does not mean that the
>> "design-is-cast-in-stone-and-cannot-be-enhanced".
>>
>> Allen: can you pl describe what you mean by  "It sounds like merging into
>> trunk is extremely premature". If we can make all unit tests pass
>> successfully on the branch, then do you think we should merge that branch
>> into the trunk?
>>
>> Konstantin: I agree that federation introduces new code complexity. But it
>> is a fact that introducing a new heavy-weight feature will add complexity.
>> If you have a different proposal (and implementation) to scale namenode,
>> please share it with us and we can then evaluate these designs in terms on
>> complexity/feature. If you have questions about certain issues in the
>> design, it would be great if you can ask them now. Hopefully, the folks
>> doing the implementation can then provide you performance numbers to
>> alleviate your concerns.
>>
>> From that way I look at it, I think the federation-feature is a huge
>> positive step in the right direction.
>>
>> thanks,
>> dhruba
>>
>>
>>
>>
>> On Mon, Mar 14, 2011 at 10:28 AM, Konstantin Shvachko <
>> shv.hadoop@gmail.com> wrote:
>>
>>> Allen is right.
>>> This is a huge new feature with 86 jiras already filed, which
>>> substantially
>>> increases the complexity of the code base.
>>> Having an in-depth motivation and benchmarking will be needed before the
>>> community decides on adopting it for support.
>>> Thanks,
>>> --Konstantin
>>>
>>>
>>>
>>> On Sat, Mar 12, 2011 at 8:43 AM, Allen Wittenauer
>>> <aw...@linkedin.com>wrote:
>>>
>>> >
>>> > On Mar 3, 2011, at 2:41 PM, Suresh Srinivas wrote:
>>> >
>>> > > We have started pushing changes for namenode federation in to the
>>> feature
>>> > branch HDFS-1052. The work items are created as subtask of the jira
>>> > HDFS-1052 and are based on the design document published in the same
>>> jira.
>>> > By the end of this week, we will complete pushing the changes to
>>> HDFS-1052
>>> > branch. Though the changes in these jiras are already committed, please
>>> do
>>> > provide your feedback on either HDFS-1052 or its subtasks. New items
>>> that
>>> > come out of the feedback will be addressed in new jiras.
>>> >
>>> > >
>>> > > Current status of the development:
>>> > > # The testing of this feature is underway. Most of the basic
>>> > functionality has been tested both for a single namenode cluster (for
>>> > backward compatibility) and with multiple namenodes.
>>> > > # All the existing tests and newly added tests pass (same as trunk).
>>> > >
>>> > > We plan on merging this branch to trunk after a week or two. This will
>>> > help us continue make future changes on the trunk. I will send an
>>> > announcement before merging the federation branch into trunk.
>>> > >
>>> >
>>> >        It sounds like merging into trunk is extremely premature.  That
>>> > said, I'm still trying to understand the why's around this.
>>> >
>>> >        To me, this series of changes looks like it is going to make
>>> running
>>> > a grid much much harder for very little benefit.  In particular, I don't
>>> see
>>> > the difference between running multiple NN/DN combinations verses
>>> running
>>> > federation, especially with client side mount tables in play.
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Konstantin Shvachko <sh...@gmail.com>.

Dhruba, good you are speaking up for federation.
I consider it important as it means more support for the feature in the
future.

The purpose of my reply was to get this discussion going, as I found Allens
question unanswered for 2 weeks.
The concern he has seems legitimate to me. If ops think federation will
"make running a grid much much harder" I want to know why and how much
harder.
Because cluster "manageability" is claimed as one of the objectives of
federation.

I sure am well familiar with the design being a part of it for a while.
And all my concerns have been articulated and well known. Though not all of
them are addressed.

The way I see it now, Federation introduces
- lots of code complexity to the system
- harder manageability, according to Allen
- potential performance degradation (tbd)
And the main question for those 95% of users, who don't run large clusters
or
don't want to place all their compute resources in one data center, is what
is the advantage in supporting it?

Performance-wise there 2 main aspects:
- Does federation give me the same cluster performance if I don't federate?
- If I federate how much more throughput can I get?

Thanks,
--Konstantin

On Mon, Mar 14, 2011 at 10:43 AM, Dhruba Borthakur <dh...@gmail.com> wrote:

> Hi folks,
>
> The design for the federation work has been a published and there is a very
> well-written design document. It explains the pros-and-cons of each design
> point. It would be nice if more people can review this document and provide
> comments on how to make it better. The implementation is in progress but
> that does not mean that the
> "design-is-cast-in-stone-and-cannot-be-enhanced".
>
> Allen: can you pl describe what you mean by  "It sounds like merging into
> trunk is extremely premature". If we can make all unit tests pass
> successfully on the branch, then do you think we should merge that branch
> into the trunk?
>
> Konstantin: I agree that federation introduces new code complexity. But it
> is a fact that introducing a new heavy-weight feature will add complexity.
> If you have a different proposal (and implementation) to scale namenode,
> please share it with us and we can then evaluate these designs in terms on
> complexity/feature. If you have questions about certain issues in the
> design, it would be great if you can ask them now. Hopefully, the folks
> doing the implementation can then provide you performance numbers to
> alleviate your concerns.
>
> From that way I look at it, I think the federation-feature is a huge
> positive step in the right direction.
>
> thanks,
> dhruba
>
>
>
>
> On Mon, Mar 14, 2011 at 10:28 AM, Konstantin Shvachko <
> shv.hadoop@gmail.com> wrote:
>
>> Allen is right.
>> This is a huge new feature with 86 jiras already filed, which
>> substantially
>> increases the complexity of the code base.
>> Having an in-depth motivation and benchmarking will be needed before the
>> community decides on adopting it for support.
>> Thanks,
>> --Konstantin
>>
>>
>>
>> On Sat, Mar 12, 2011 at 8:43 AM, Allen Wittenauer
>> <aw...@linkedin.com>wrote:
>>
>> >
>> > On Mar 3, 2011, at 2:41 PM, Suresh Srinivas wrote:
>> >
>> > > We have started pushing changes for namenode federation in to the
>> feature
>> > branch HDFS-1052. The work items are created as subtask of the jira
>> > HDFS-1052 and are based on the design document published in the same
>> jira.
>> > By the end of this week, we will complete pushing the changes to
>> HDFS-1052
>> > branch. Though the changes in these jiras are already committed, please
>> do
>> > provide your feedback on either HDFS-1052 or its subtasks. New items
>> that
>> > come out of the feedback will be addressed in new jiras.
>> >
>> > >
>> > > Current status of the development:
>> > > # The testing of this feature is underway. Most of the basic
>> > functionality has been tested both for a single namenode cluster (for
>> > backward compatibility) and with multiple namenodes.
>> > > # All the existing tests and newly added tests pass (same as trunk).
>> > >
>> > > We plan on merging this branch to trunk after a week or two. This will
>> > help us continue make future changes on the trunk. I will send an
>> > announcement before merging the federation branch into trunk.
>> > >
>> >
>> >        It sounds like merging into trunk is extremely premature.  That
>> > said, I'm still trying to understand the why's around this.
>> >
>> >        To me, this series of changes looks like it is going to make
>> running
>> > a grid much much harder for very little benefit.  In particular, I don't
>> see
>> > the difference between running multiple NN/DN combinations verses
>> running
>> > federation, especially with client side mount tables in play.
>> >
>> >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Dhruba Borthakur <dh...@gmail.com>.

Hi folks,

The design for the federation work has been a published and there is a very
well-written design document. It explains the pros-and-cons of each design
point. It would be nice if more people can review this document and provide
comments on how to make it better. The implementation is in progress but
that does not mean that the
"design-is-cast-in-stone-and-cannot-be-enhanced".

Allen: can you pl describe what you mean by  "It sounds like merging into
trunk is extremely premature". If we can make all unit tests pass
successfully on the branch, then do you think we should merge that branch
into the trunk?

Konstantin: I agree that federation introduces new code complexity. But it
is a fact that introducing a new heavy-weight feature will add complexity.
If you have a different proposal (and implementation) to scale namenode,
please share it with us and we can then evaluate these designs in terms on
complexity/feature. If you have questions about certain issues in the
design, it would be great if you can ask them now. Hopefully, the folks
doing the implementation can then provide you performance numbers to
alleviate your concerns.

>From that way I look at it, I think the federation-feature is a huge
positive step in the right direction.

thanks,
dhruba

On Mon, Mar 14, 2011 at 10:28 AM, Konstantin Shvachko
<sh...@gmail.com>wrote:

> Allen is right.
> This is a huge new feature with 86 jiras already filed, which substantially
> increases the complexity of the code base.
> Having an in-depth motivation and benchmarking will be needed before the
> community decides on adopting it for support.
> Thanks,
> --Konstantin
>
>
>
> On Sat, Mar 12, 2011 at 8:43 AM, Allen Wittenauer
> <aw...@linkedin.com>wrote:
>
> >
> > On Mar 3, 2011, at 2:41 PM, Suresh Srinivas wrote:
> >
> > > We have started pushing changes for namenode federation in to the
> feature
> > branch HDFS-1052. The work items are created as subtask of the jira
> > HDFS-1052 and are based on the design document published in the same
> jira.
> > By the end of this week, we will complete pushing the changes to
> HDFS-1052
> > branch. Though the changes in these jiras are already committed, please
> do
> > provide your feedback on either HDFS-1052 or its subtasks. New items that
> > come out of the feedback will be addressed in new jiras.
> >
> > >
> > > Current status of the development:
> > > # The testing of this feature is underway. Most of the basic
> > functionality has been tested both for a single namenode cluster (for
> > backward compatibility) and with multiple namenodes.
> > > # All the existing tests and newly added tests pass (same as trunk).
> > >
> > > We plan on merging this branch to trunk after a week or two. This will
> > help us continue make future changes on the trunk. I will send an
> > announcement before merging the federation branch into trunk.
> > >
> >
> >        It sounds like merging into trunk is extremely premature.  That
> > said, I'm still trying to understand the why's around this.
> >
> >        To me, this series of changes looks like it is going to make
> running
> > a grid much much harder for very little benefit.  In particular, I don't
> see
> > the difference between running multiple NN/DN combinations verses running
> > federation, especially with client side mount tables in play.
> >
> >
>

-- 
Connect to me at http://www.facebook.com/dhruba

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Konstantin Shvachko <sh...@gmail.com>.

Allen is right.
This is a huge new feature with 86 jiras already filed, which substantially
increases the complexity of the code base.
Having an in-depth motivation and benchmarking will be needed before the
community decides on adopting it for support.
Thanks,
--Konstantin



On Sat, Mar 12, 2011 at 8:43 AM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On Mar 3, 2011, at 2:41 PM, Suresh Srinivas wrote:
>
> > We have started pushing changes for namenode federation in to the feature
> branch HDFS-1052. The work items are created as subtask of the jira
> HDFS-1052 and are based on the design document published in the same jira.
> By the end of this week, we will complete pushing the changes to HDFS-1052
> branch. Though the changes in these jiras are already committed, please do
> provide your feedback on either HDFS-1052 or its subtasks. New items that
> come out of the feedback will be addressed in new jiras.
>
> >
> > Current status of the development:
> > # The testing of this feature is underway. Most of the basic
> functionality has been tested both for a single namenode cluster (for
> backward compatibility) and with multiple namenodes.
> > # All the existing tests and newly added tests pass (same as trunk).
> >
> > We plan on merging this branch to trunk after a week or two. This will
> help us continue make future changes on the trunk. I will send an
> announcement before merging the federation branch into trunk.
> >
>
>        It sounds like merging into trunk is extremely premature.  That
> said, I'm still trying to understand the why's around this.
>
>        To me, this series of changes looks like it is going to make running
> a grid much much harder for very little benefit.  In particular, I don't see
> the difference between running multiple NN/DN combinations verses running
> federation, especially with client side mount tables in play.
>
>

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by suresh srinivas <sr...@gmail.com>.

>
>         But this does make things easier.  Although I'm still fairly
> confident that it adds too much complexity for little gain though.


Allen,can you please add details on what complexity you are talking about
here? (I have already asked this question many times)

>From code perspective it is not adding complexity, as I have explained
before.

You could chose to run the cluster with single namenode and not see much
difference. But federation does solve in our case complicated setting up of
multiple clusters, balancing the storage across the clusters, lack of single
view and duplication of data.

So put this in the 'agree to disagree' column.  It would still be nice if
> you guys could lay off the camelCase options though.  Admins hate the shift
> key.
>

I did reply to your comment saying the options are case insensitive.



>
>        BTW, Robert C. asked what I thought you guys should have been
> working on instead of Federation.  I told him (and you) high availability of
> the namenode (which I still believe is necessary for HDFS in more and more
> cases), but I've had more time to think about it.  So expect my list (which
> I'll post here) soon.  :p
>
> Federation is solving an important problem for us. We are looking at HA, as
you might have seen in some of the jira activities.

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Allen Wittenauer <aw...@apache.org>.

On Mar 21, 2011, at 4:08 PM, Sanjay Radia wrote:
> 
> Allen, not sure if I explained the difference above.
> Base on the discussion we had at the Hug, I want to clarify a few things

	Thanks for taking the time at HUG.  (I've since figured out that I lost your messages as part of my email list transition.)

> A DN stores block for only ONE cluster.


	But this does make things easier.  Although I'm still fairly confident that it adds too much complexity for little gain though.  So put this in the 'agree to disagree' column.  It would still be nice if you guys could lay off the camelCase options though.  Admins hate the shift key.

	BTW, Robert C. asked what I thought you guys should have been working on instead of Federation.  I told him (and you) high availability of the namenode (which I still believe is necessary for HDFS in more and more cases), but I've had more time to think about it.  So expect my list (which I'll post here) soon.  :p

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by suresh srinivas <sr...@gmail.com>.

>
>
> A few questions:
> - Do we have a clear definition for a cluster?
>

Cluster before federation is defined by list of datanodes in include file,
bound together by namespaceID of the namenode that these nodes bind to  on
first registration with the namenode. In essence, namespaceID defines the
cluster nodes.

In federation cluster namenodes are setup with the same clusterID. ClusterID
is established at the datanodes when they first register with a namenode. So
nodes with the same clusterID are part of the cluster.

- With the above definition, is it an error if not all DNs belong to the
> same set of NNs?
>
A DN has to belong to same set of NNs sharing the same clusterID. DNs cannot
register with a namenode that has a different clusterID.


> - With the working definition of a cluster, what namespace guarantees are
> given to clients?
>
I  am not sure what you mean by this.

>
>

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Brian Bockelman <bb...@cse.unl.edu>.

On Mar 21, 2011, at 6:08 PM, Sanjay Radia wrote:

> 
> On Mar 14, 2011, at 10:57 AM, Sanjay Radia wrote:
> 
>> 
>> On Mar 12, 2011, at 8:43 AM, Allen Wittenauer wrote:
>> 
>>> 
>>> 	To me, this series of changes looks like it is going to make
>>> running a grid much much harder for very little benefit.  In
>>> particular, I don't see the difference between running multiple NN/
>>> DN combinations verses running federation, especially with client
>>> side mount tables in play.
>> 
>> 
>> 
>> Main difference between independent HDFS clusters and HDFS federation
>> is that in federation one can shares the storage of the DNs and the DNs.
>> There is a very detailed document that describes this on the Jira.
>> 
>> If you are running a single NN and you don't need the scaling then
>> running and managing hadoop is for all practical purposes unchanged.
>> 
>> 
>> sanjay
>>> 
>> 
> 
> 
> Allen, not sure if I explained the difference above.
> Base on the discussion we had at the Hug, I want to clarify a few things
> 
> In federation the NNs and the DNs are part of  a cluster. It is not as if a data node is willing to store blocks for any NN anywhere in the data center.
> We still expect a data center to have multiple hadoop clusters each with a set of data nodes and each cluster with 1 or more NNs.
> A DN stores block for only ONE cluster.

A few questions:
- Do we have a clear definition for a cluster?
- With the above definition, is it an error if not all DNs belong to the same set of NNs?
- With the working definition of a cluster, what namespace guarantees are given to clients?

The reason I ask is not because I oppose the idea of federations, but rather am curious of about the terminology and how it's 'advertised' to the user.  I rather like the design; it has similar ideas to a NSF project I've seen (http://www.reddnet.org/).

> 
> You had asked about how one debugs a corrupt file or corrupt block.
> In the old world a file's inode contains the block ids of its blocks. There is also a mapping from block id to block location (ie which DN).
> In the federated hdfs, each block is identified by a longer block id, called the extended block id= blockPool Id + block id.
> A block pool is owned by only ONE NN.
> Hence if you are trying to locate a block then you map the extended block id to the block location (ie DN) - this is the same as before, except that the identifier
> of the block is merely longer.
> 
> If you are trying to debug from the point of view of the DN:
> In federated HDFS, the blocks stored in the DN are segregated in directories by the blockPool Id.
> The block pool id can be mapped to a NN since each Block pool has only  ONE owner.
> Hence to map from a block to a particular NN is easy - the first part of the Block's longer identifier  will tell you which NN owns that block.
> 

This sounds good.

Brian

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Sanjay Radia <sr...@yahoo-inc.com>.

On Mar 14, 2011, at 10:57 AM, Sanjay Radia wrote:

>
> On Mar 12, 2011, at 8:43 AM, Allen Wittenauer wrote:
>
>>
>> 	To me, this series of changes looks like it is going to make
>> running a grid much much harder for very little benefit.  In
>> particular, I don't see the difference between running multiple NN/
>> DN combinations verses running federation, especially with client
>> side mount tables in play.
>
>
>
> Main difference between independent HDFS clusters and HDFS federation
> is that in federation one can shares the storage of the DNs and the  
> DNs.
> There is a very detailed document that describes this on the Jira.
>
> If you are running a single NN and you don't need the scaling then
> running and managing hadoop is for all practical purposes unchanged.
>
>
> sanjay
>>
>

Allen, not sure if I explained the difference above.
Base on the discussion we had at the Hug, I want to clarify a few things

In federation the NNs and the DNs are part of  a cluster. It is not as  
if a data node is willing to store blocks for any NN anywhere in the  
data center.
We still expect a data center to have multiple hadoop clusters each  
with a set of data nodes and each cluster with 1 or more NNs.
A DN stores block for only ONE cluster.

You had asked about how one debugs a corrupt file or corrupt block.
In the old world a file's inode contains the block ids of its blocks.  
There is also a mapping from block id to block location (ie which DN).
In the federated hdfs, each block is identified by a longer block id,  
called the extended block id= blockPool Id + block id.
A block pool is owned by only ONE NN.
Hence if you are trying to locate a block then you map the extended  
block id to the block location (ie DN) - this is the same as before,  
except that the identifier
of the block is merely longer.

If you are trying to debug from the point of view of the DN:
  In federated HDFS, the blocks stored in the DN are segregated in  
directories by the blockPool Id.
The block pool id can be mapped to a NN since each Block pool has  
only  ONE owner.
Hence to map from a block to a particular NN is easy - the first part  
of the Block's longer identifier  will tell you which NN owns that  
block.

sanjay

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Sanjay Radia <sr...@yahoo-inc.com>.

On Mar 12, 2011, at 8:43 AM, Allen Wittenauer wrote:

>
> On Mar 3, 2011, at 2:41 PM, Suresh Srinivas wrote:
>
>> We have started pushing changes for namenode federation in to the  
>> feature branch HDFS-1052. The work items are created as subtask of  
>> the jira HDFS-1052 and are based on the design document published  
>> in the same jira. By the end of this week, we will complete pushing  
>> the changes to HDFS-1052 branch. Though the changes in these jiras  
>> are already committed, please do provide your feedback on either  
>> HDFS-1052 or its subtasks. New items that come out of the feedback  
>> will be addressed in new jiras.
>
>>
>> Current status of the development:
>> # The testing of this feature is underway. Most of the basic  
>> functionality has been tested both for a single namenode cluster  
>> (for backward compatibility) and with multiple namenodes.
>> # All the existing tests and newly added tests pass (same as trunk).
>>
>> We plan on merging this branch to trunk after a week or two. This  
>> will help us continue make future changes on the trunk. I will send  
>> an announcement before merging the federation branch into trunk.
>>
>
> 	It sounds like merging into trunk is extremely premature.  That  
> said, I'm still trying to understand the why's around this.
>
> 	To me, this series of changes looks like it is going to make  
> running a grid much much harder for very little benefit.  In  
> particular, I don't see the difference between running multiple NN/ 
> DN combinations verses running federation, especially with client  
> side mount tables in play.



Main difference between independent HDFS clusters and HDFS federation  
is that in federation one can shares the storage of the DNs and the DNs.
There is a very detailed document that describes this on the Jira.

If you are running a single NN and you don't need the scaling then  
running and managing hadoop is for all practical purposes unchanged.


sanjay
>

Re: Merging Namenode Federation feature (HDFS-1052) to trunk

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Mar 3, 2011, at 2:41 PM, Suresh Srinivas wrote:

> We have started pushing changes for namenode federation in to the feature branch HDFS-1052. The work items are created as subtask of the jira HDFS-1052 and are based on the design document published in the same jira. By the end of this week, we will complete pushing the changes to HDFS-1052 branch. Though the changes in these jiras are already committed, please do provide your feedback on either HDFS-1052 or its subtasks. New items that come out of the feedback will be addressed in new jiras.

> 
> Current status of the development:
> # The testing of this feature is underway. Most of the basic functionality has been tested both for a single namenode cluster (for backward compatibility) and with multiple namenodes.
> # All the existing tests and newly added tests pass (same as trunk).
> 
> We plan on merging this branch to trunk after a week or two. This will help us continue make future changes on the trunk. I will send an announcement before merging the federation branch into trunk.
> 

	It sounds like merging into trunk is extremely premature.  That said, I'm still trying to understand the why's around this.

	To me, this series of changes looks like it is going to make running a grid much much harder for very little benefit.  In particular, I don't see the difference between running multiple NN/DN combinations verses running federation, especially with client side mount tables in play.