You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Dhruba Borthakur <dh...@gmail.com> on 2009/09/25 19:13:14 UTC

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

It is really nice to have wire-compatibility between clients and servers
running different versions of hadoop. The reason we would like this is
because we can allow the same client (Hive, etc) submit jobs to two
different clusters running different versions of hadoop. But I am not stuck
up on the name of the release that supports wire-compatibility, it can be
either 1.0  or something later than that.
API compatibility  +1
Data compatibility +1
Job Q compatibility -1Wire compatibility +0

thanks,
dhruba


On Fri, Sep 25, 2009 at 10:05 AM, Doug Cutting <cu...@apache.org> wrote:

> Sanjay Radia wrote:
>
>> Both Facebook (Dhruba tells me) and Yahoo are suffering badly from the
>> lack of wire compatibility -  a major motivaiton
>> for Yahoo to develop Avro.
>>
>
> Indeed.  Wire compatibility is a crucial feature that we should release as
> soon as possible.  Perhaps before 1.0 if 1.0 slips, perhaps after if we
> discover that it's harder to implement than we anticipate.
>
>  Wire compatibility - open question; but my thoughts are:
>>     With the progress we have made on Avro so far I think there is a very
>> good chance to get wire compatibility in 22 which we
>> can then call 0.99 or 1.0. I think it is worth a shot.
>>
>
> +1 It's certainly worth a shot.
>
> 1.0 is fundamentally about being able to upgrade a cluster without changing
> application code, i.e., API compatibility.  Wire compatibility will let
> folks, e.g., use a single client library version to talk to clusters running
> different versions, a wonderful feature, but distinct from the fundamental
> goal of 1.0.
>
> In general we should not tie too many features to specific releases in
> advance of their implementation, since that causes releases to slip when
> features slip.  Rather, we should work hard to implement high-priority
> features and release periodically, as features are completed and we are able
> to qualify releases.  Long-term API compatibility is a very high-priority
> feature.  The first release that has APIs that we think we can support
> back-compatibly for perhaps a few years should be called 1.0.  Hopefully
> that will also have some other high-priority features, like security, wire
> compatibility, etc.  But I don't see the purpose of requiring a specific
> list of high-priority features besides API compatibility before we declare
> 1.0, and doing so could needlessly keep valuable features from users.
>
> Doug
>



-- 
Connect to me at http://www.facebook.com/dhruba

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Allen Wittenauer <aw...@linkedin.com>.

On 9/25/09 2:40 PM, "Doug Cutting" <cu...@apache.org> wrote:
> Would it be materially better for you if we waited longer before calling
> a release 1.0, assuming that the same features are released in the same
> order and on the same schedule regardless of the release name?

Yes.

There is something magic to managerial types when you say "This is not 1.0"
that make them realize that things are far from reliable/stable/practical
from an operations perspective.  When you say "1.0" the
tool/product/whatever those expectations are way higher.

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Doug Cutting <cu...@apache.org>.

Allen Wittenauer wrote:
> Oh, I completely understand.  I'm just throwing in a non-developer's
> opinion... because I'm sure I'm not the only one expecting/assuming that 1.0
> == completely stable.  

If we have to live up to that expectation then we might never release 
1.0!  Frankly, I fear the longer we delay a 1.0 release the more we 
raise expectations that it will be all things.  Rather, I'd like to have 
1.0 to mean just one thing: back-compatible APIs until 2.0, with the 
expectation that there will be several 1.x releases between.  We can add 
other things to 1.0, which might push it out further, but I don't see 
how that helps things much.

Would it be materially better for you if we waited longer before calling 
a release 1.0, assuming that the same features are released in the same 
order and on the same schedule regardless of the release name?

Doug

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Allen Wittenauer <aw...@linkedin.com>.



On 9/25/09 1:18 PM, "Doug Cutting" <cu...@apache.org> wrote:
> The question is not whether wire compatibility is a good thing.  The
> question is whether API compatibility is useless without wire
> compatibility and, vice versa, whether wire compatibility is useless
> without API compatibility.  They're both valuable features and we should
> get both of them out as soon as feasible.
> 
> The question is if one slips whether we should we hold the other.  I
> don't think we should.  Hence we should not in advance tie a particular
> release name to both features.  That's all I'm saying.  I claim that the
> 1.0 moniker is most strongly tied to API compatibility.  If we can get
> wire and other sorts of valuable compatibility into that same release,
> then great.  If one comes out earlier or later they're both still
> valuable.  But neither needs to block the other.

Oh, I completely understand.  I'm just throwing in a non-developer's
opinion... because I'm sure I'm not the only one expecting/assuming that 1.0
== completely stable.

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Doug Cutting <cu...@apache.org>.

Allen Wittenauer wrote:
> This is just so disappointing and, quite frankly, makes 1.0 less than useful
> for Real Work.  Great, the APIs don't  change but you still have the same
> problems of getting data on/off the grid without upgrading your clients
> every time. 
> 
> To me, without wire compatibility, 1.0 makes me feel pretty "meh; who
> cares--we're still going to be in upgrade hell".

The question is not whether wire compatibility is a good thing.  The 
question is whether API compatibility is useless without wire 
compatibility and, vice versa, whether wire compatibility is useless 
without API compatibility.  They're both valuable features and we should 
get both of them out as soon as feasible.

The question is if one slips whether we should we hold the other.  I 
don't think we should.  Hence we should not in advance tie a particular 
release name to both features.  That's all I'm saying.  I claim that the 
1.0 moniker is most strongly tied to API compatibility.  If we can get 
wire and other sorts of valuable compatibility into that same release, 
then great.  If one comes out earlier or later they're both still 
valuable.  But neither needs to block the other.

Doug

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Allen Wittenauer <aw...@linkedin.com>.

On 9/25/09 12:44 PM, "Sanjay Radia" <sr...@yahoo-inc.com> wrote:

> 
> On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:
> 
>> On 9/25/09 10:13 AM, "Dhruba Borthakur" <dh...@gmail.com> wrote:
>>> It is really nice to have wire-compatibility between clients and
>> servers
>>> running different versions of hadoop. The reason we would like
>> this is
>>> because we can allow the same client (Hive, etc) submit jobs to two
>>> different clusters running different versions of hadoop. But I am
>> not stuck
>>> up on the name of the release that supports wire-compatibility, it
>> can be
>>> either 1.0  or something later than that.
>> 
>> To me, the lack of wire compatibility makes will make "Hadoop 1.0"
>> in name
>> only when in reality it is more like 0.80. :(
> 
> My sentiments exactly, though I could learn to live with it ....

We just had this discussion today about how to put Hadoop into a production
pipeline.  I was under the impression that 1.0 was going to be wire
compatible too.

This is just so disappointing and, quite frankly, makes 1.0 less than useful
for Real Work.  Great, the APIs don't  change but you still have the same
problems of getting data on/off the grid without upgrading your clients
every time. 

To me, without wire compatibility, 1.0 makes me feel pretty "meh; who
cares--we're still going to be in upgrade hell".

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Sanjay Radia <sr...@yahoo-inc.com>.

On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:

> On 9/25/09 10:13 AM, "Dhruba Borthakur" <dh...@gmail.com> wrote:
> > It is really nice to have wire-compatibility between clients and  
> servers
> > running different versions of hadoop. The reason we would like  
> this is
> > because we can allow the same client (Hive, etc) submit jobs to two
> > different clusters running different versions of hadoop. But I am  
> not stuck
> > up on the name of the release that supports wire-compatibility, it  
> can be
> > either 1.0  or something later than that.
>
> To me, the lack of wire compatibility makes will make "Hadoop 1.0"  
> in name
> only when in reality it is more like 0.80. :(

My sentiments exactly, though I could learn to live with it ....


>

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Allen Wittenauer <aw...@linkedin.com>.

On 9/25/09 10:13 AM, "Dhruba Borthakur" <dh...@gmail.com> wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0  or something later than that.

To me, the lack of wire compatibility makes will make "Hadoop 1.0" in name
only when in reality it is more like 0.80. :(

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Andrew Purtell <ap...@apache.org>.

HBase and similar HDFS clients could benefit from a (high) performant 
stable datacenter network protocol that is built into the namenode and
datanodes. Then we could decouple from Hadoop versioning and release
cycle. HDFS could decouple from core, etc. 

Whatever stable network protocol is devised, if any, of course should
perform as well if not better than the current one. A stable but lower
performing option, unfortunately, would be excluded from consideration
right away. 

HBase is a bit of a special case currently perhaps in that its access
pattern is random read/write and it may be only a handful of clients
like that. However if HDFS is positioned as a product in its own right,
which I believe is the case since the split, there may be many other
potential users of it -- for all of its benefits -- given a stable 
wire format that enables decoupled development. 

API compatibility  +1
Data compatibility +1
Wire compatibility +1

Best regards,

Andrew Purtell
Committing Member, HBase Project: hbase.org

________________________________
From: Steve Loughran <st...@apache.org>
To: common-dev@hadoop.apache.org
Sent: Monday, September 28, 2009 3:15:09 AM
Subject: Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Dhruba Borthakur wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0  or something later than that.
> API compatibility  +1
> Data compatibility +1
> Job Q compatibility -1Wire compatibility +0

That's stability of the job submission network protocol you are looking for there.
* We need a job submission API that is designed to work over long-haul links and versions
* It does not have to be the same as anything used in-cluster
* It does not actually need to run in the JobTracker. An independent service bridging the stable long-haul API to an unstable datacentre protocol does work, though authentication and user-rights are a troublespot

Similarly, it would be good for a stable long-haul HDFS protocol, such as FTP or webdav. Again, no need to build into the namenode .

see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Dhruba Borthakur <dh...@gmail.com>.

I think we should not require Job Q compatibility for 1.0 release.

thanks,
dhruba


On Mon, Sep 28, 2009 at 11:06 AM, Sanjay Radia <sr...@yahoo-inc.com> wrote:

>
> On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:
>
>  Dhruba Borthakur wrote:
>> > It is really nice to have wire-compatibility between clients and servers
>> > running different versions of hadoop. The reason we would like this is
>> > because we can allow the same client (Hive, etc) submit jobs to two
>> > different clusters running different versions of hadoop. But I am not
>> stuck
>> > up on the name of the release that supports wire-compatibility, it can
>> be
>> > either 1.0  or something later than that.
>> > API compatibility  +1
>> > Data compatibility +1
>> > Job Q compatibility -1Wire compatibility +0
>>
>>
>> That's stability of the job submission network protocol you are looking
>> for there.
>>  * We need a job submission API that is designed to work over long-haul
>> links and versions
>>  * It does not have to be the same as anything used in-cluster
>>  * It does not actually need to run in the JobTracker. An independent
>> service bridging the stable long-haul API to an unstable datacentre
>> protocol does work, though authentication and user-rights are a
>> troublespot
>>
>>
>
>
> I think you are misinterpreting what Job Q compatibility means.
> It is about jobs already in the queue surviving an upgrade across a
> release.
>
> See my initial proposal on Jan 16th:
>
> https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
> #action_12664691
>
> Doug argued that it is nice to have but not required for 1.0 - can be added
> later.
>
>
> sanjay
>
>
>> Similarly, it would be good for a stable long-haul HDFS protocol, such
>> as FTP or webdav. Again, no need to build into the namenode .
>>
>> see http://www.slideshare.net/steve_l/long-haul-hadoop
>> and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop
>>
>>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Sanjay Radia <sr...@yahoo-inc.com>.

On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:

> Dhruba Borthakur wrote:
> > It is really nice to have wire-compatibility between clients and  
> servers
> > running different versions of hadoop. The reason we would like  
> this is
> > because we can allow the same client (Hive, etc) submit jobs to two
> > different clusters running different versions of hadoop. But I am  
> not stuck
> > up on the name of the release that supports wire-compatibility, it  
> can be
> > either 1.0  or something later than that.
> > API compatibility  +1
> > Data compatibility +1
> > Job Q compatibility -1Wire compatibility +0
>
>
> That's stability of the job submission network protocol you are  
> looking
> for there.
>   * We need a job submission API that is designed to work over long- 
> haul
> links and versions
>   * It does not have to be the same as anything used in-cluster
>   * It does not actually need to run in the JobTracker. An independent
> service bridging the stable long-haul API to an unstable datacentre
> protocol does work, though authentication and user-rights are a  
> troublespot
>



I think you are misinterpreting what Job Q compatibility means.
It is about jobs already in the queue surviving an upgrade across a  
release.

See my initial proposal on Jan 16th:
https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel 
#action_12664691

Doug argued that it is nice to have but not required for 1.0 - can be  
added later.


sanjay
>
> Similarly, it would be good for a stable long-haul HDFS protocol, such
> as FTP or webdav. Again, no need to build into the namenode .
>
> see http://www.slideshare.net/steve_l/long-haul-hadoop
> and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop
>

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Posted by Steve Loughran <st...@apache.org>.

Dhruba Borthakur wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0  or something later than that.
> API compatibility  +1
> Data compatibility +1
> Job Q compatibility -1Wire compatibility +0

That's stability of the job submission network protocol you are looking 
for there.
  * We need a job submission API that is designed to work over long-haul 
links and versions
  * It does not have to be the same as anything used in-cluster
  * It does not actually need to run in the JobTracker. An independent 
service bridging the stable long-haul API to an unstable datacentre 
protocol does work, though authentication and user-rights are a troublespot

Similarly, it would be good for a stable long-haul HDFS protocol, such 
as FTP or webdav. Again, no need to build into the namenode .

see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop