You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Milind Bhandarkar <mb...@gopivotal.com> on 2013/10/11 05:31:48 UTC

hdfs project separation

( this message is not intended for specific folks, by mistake, but for all the hdfs-dev list, deliberately;)

Hello Folks,

I do not want to scratch the already bleeding wounds, and want to resolve these issues amicably, without causing a big inter-vendor confrontation.

So, these are the facts, as I (and several others in the hadoop community) see this.

1. there was an attempt to separate different hadoop projects, such as common, hdfs, mapreduce.

2. that attempt was aborted because of several things. common ownership, i.e. committership being the biggest issue.

3. in the meanwhile, several important, release-worthy, hdfs improvements were committed to Hadoop. (Thats why I supported Konst's appeal for 0.22. And also incorporated into Hadoop products by the largest hadoop ecosystem contributor, and several others.)

4. All the apache hadoop bylaws were followed, to get these improvements into Hadoop project.

5. Yet, common project, which is not even a top-level project, since the awkward re-merge happened, got an invompatible wire-protocol change, which was accepted and promoted by a specific section, in spite of kicking and screaming of (what I think of) a representative of a large hadoop user community.

6. That, and such other changes, has created a big issue for a part of the community which has tested hdfs part of 2.x and has spent a lot of efforts to stabilize hdfs, since this was the major part of assault from proprietary storage systems, such as You-Know-Who.

I would like to raise this issue as an individual, regardless of my affiliation, so that, we can make hdfs worthy of its association with the top level ecosystem, without being closely associated with it.

What do the hdfs developers think? 

- milind

Sent from my iPhone

RE: hdfs project separation

Posted by Milind Bhandarkar <mb...@gopivotal.com>.
Doug,

Your understanding is correct. But I would like to start with a less
ambitious plan first. By duplicating common rpc etc code, and renaming
packages in common, we can independently build two different artifacts from
the same repo, one for hdfs and one for Yarn+MR. Then we can decide whether
we want to separate these projects completely, making independent releases.

I believe the last split of project failed because of common dependencies in
both MR and HDFS, which meant that changes to RPC etc were affecting both
upper level projects. I think we should avoid that, by duplicating needed
common code.

I would like to see what the community thinks, before making detailed plans.

- milind


-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: Friday, October 11, 2013 11:12 AM
To: hdfs-dev@hadoop.apache.org
Subject: Re: hdfs project separation

On Fri, Oct 11, 2013 at 9:14 AM, Milind Bhandarkar
<mb...@gopivotal.com> wrote:
> If HDFS is released independently, with its own RPC and protocol versions,
> features such as pluggable namespaces will not have to wait for the next
> mega-release of the entire stack.

The plan as I understand it is to eventually be able to release common/hdfs
& yarn/mr independently, as two, three or perhaps four different products.
Once we've got that down we can consider splitting into multiple TLPs.  For
this to transpire requires folks to volunteer to create an independent
release, establishing a plan, helping to make the required changes, calling
the vote, etc.  Someone could propose doing this first with HDFS, YARN or
whatever someone thinks is best.  It would take concerted effort by a few
folks, along with consent of the rest of the project.

Do you have a detailed plan?  If so, you could share it and start trying to
build consensus around it.

Doug

Re: hdfs project separation

Posted by Doug Cutting <cu...@apache.org>.
On Fri, Oct 11, 2013 at 9:14 AM, Milind Bhandarkar
<mb...@gopivotal.com> wrote:
> If HDFS is released independently, with its own RPC and protocol versions, features such as pluggable namespaces will not have to wait for the next mega-release of the entire stack.

The plan as I understand it is to eventually be able to release
common/hdfs & yarn/mr independently, as two, three or perhaps four
different products.  Once we've got that down we can consider
splitting into multiple TLPs.  For this to transpire requires folks to
volunteer to create an independent release, establishing a plan,
helping to make the required changes, calling the vote, etc.  Someone
could propose doing this first with HDFS, YARN or whatever someone
thinks is best.  It would take concerted effort by a few folks, along
with consent of the rest of the project.

Do you have a detailed plan?  If so, you could share it and start
trying to build consensus around it.

Doug

Re: hdfs project separation

Posted by Milind Bhandarkar <mb...@gopivotal.com>.
Let me add a bit more about the feasibility of this.

I have been doing some experiments by duplicating some common code code in hdfs-only , and yarn/MR only; and am able to build and use hdfs independently.

Now that bigtop has matured, we can still do a single distro in apache with independently released mr/yarn and hdfs.

That will enable parallel development, and will also reduce stabilization overload at a mega-release time.

If HDFS is released independently, with its own RPC and protocol versions, features such as pluggable namespaces will not have to wait for the next mega-release of the entire stack.

Would love to hear what hdfs developers think about this.

- milind

Sent from my iPhone

> On Oct 10, 2013, at 20:31, Milind Bhandarkar <mb...@gopivotal.com> wrote:
> 
> ( this message is not intended for specific folks, by mistake, but for all the hdfs-dev list, deliberately;)
> 
> Hello Folks,
> 
> I do not want to scratch the already bleeding wounds, and want to resolve these issues amicably, without causing a big inter-vendor confrontation.
> 
> So, these are the facts, as I (and several others in the hadoop community) see this.
> 
> 1. there was an attempt to separate different hadoop projects, such as common, hdfs, mapreduce.
> 
> 2. that attempt was aborted because of several things. common ownership, i.e. committership being the biggest issue.
> 
> 3. in the meanwhile, several important, release-worthy, hdfs improvements were committed to Hadoop. (Thats why I supported Konst's appeal for 0.22. And also incorporated into Hadoop products by the largest hadoop ecosystem contributor, and several others.)
> 
> 4. All the apache hadoop bylaws were followed, to get these improvements into Hadoop project.
> 
> 5. Yet, common project, which is not even a top-level project, since the awkward re-merge happened, got an invompatible wire-protocol change, which was accepted and promoted by a specific section, in spite of kicking and screaming of (what I think of) a representative of a large hadoop user community.
> 
> 6. That, and such other changes, has created a big issue for a part of the community which has tested hdfs part of 2.x and has spent a lot of efforts to stabilize hdfs, since this was the major part of assault from proprietary storage systems, such as You-Know-Who.
> 
> I would like to raise this issue as an individual, regardless of my affiliation, so that, we can make hdfs worthy of its association with the top level ecosystem, without being closely associated with it.
> 
> What do the hdfs developers think? 
> 
> - milind
> 
> Sent from my iPhone