You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Dhruba Borthakur <dh...@gmail.com> on 2009/09/25 19:13:14 UTC
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
It is really nice to have wire-compatibility between clients and servers
running different versions of hadoop. The reason we would like this is
because we can allow the same client (Hive, etc) submit jobs to two
different clusters running different versions of hadoop. But I am not stuck
up on the name of the release that supports wire-compatibility, it can be
either 1.0 or something later than that.
API compatibility +1
Data compatibility +1
Job Q compatibility -1Wire compatibility +0
thanks,
dhruba
On Fri, Sep 25, 2009 at 10:05 AM, Doug Cutting <cu...@apache.org> wrote:
> Sanjay Radia wrote:
>
>> Both Facebook (Dhruba tells me) and Yahoo are suffering badly from the
>> lack of wire compatibility - a major motivaiton
>> for Yahoo to develop Avro.
>>
>
> Indeed. Wire compatibility is a crucial feature that we should release as
> soon as possible. Perhaps before 1.0 if 1.0 slips, perhaps after if we
> discover that it's harder to implement than we anticipate.
>
> Wire compatibility - open question; but my thoughts are:
>> With the progress we have made on Avro so far I think there is a very
>> good chance to get wire compatibility in 22 which we
>> can then call 0.99 or 1.0. I think it is worth a shot.
>>
>
> +1 It's certainly worth a shot.
>
> 1.0 is fundamentally about being able to upgrade a cluster without changing
> application code, i.e., API compatibility. Wire compatibility will let
> folks, e.g., use a single client library version to talk to clusters running
> different versions, a wonderful feature, but distinct from the fundamental
> goal of 1.0.
>
> In general we should not tie too many features to specific releases in
> advance of their implementation, since that causes releases to slip when
> features slip. Rather, we should work hard to implement high-priority
> features and release periodically, as features are completed and we are able
> to qualify releases. Long-term API compatibility is a very high-priority
> feature. The first release that has APIs that we think we can support
> back-compatibly for perhaps a few years should be called 1.0. Hopefully
> that will also have some other high-priority features, like security, wire
> compatibility, etc. But I don't see the purpose of requiring a specific
> list of high-priority features besides API compatibility before we declare
> 1.0, and doing so could needlessly keep valuable features from users.
>
> Doug
>
--
Connect to me at http://www.facebook.com/dhruba
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Allen Wittenauer <aw...@linkedin.com>.
On 9/25/09 2:40 PM, "Doug Cutting" <cu...@apache.org> wrote:
> Would it be materially better for you if we waited longer before calling
> a release 1.0, assuming that the same features are released in the same
> order and on the same schedule regardless of the release name?
Yes.
There is something magic to managerial types when you say "This is not 1.0"
that make them realize that things are far from reliable/stable/practical
from an operations perspective. When you say "1.0" the
tool/product/whatever those expectations are way higher.
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Doug Cutting <cu...@apache.org>.
Allen Wittenauer wrote:
> Oh, I completely understand. I'm just throwing in a non-developer's
> opinion... because I'm sure I'm not the only one expecting/assuming that 1.0
> == completely stable.
If we have to live up to that expectation then we might never release
1.0! Frankly, I fear the longer we delay a 1.0 release the more we
raise expectations that it will be all things. Rather, I'd like to have
1.0 to mean just one thing: back-compatible APIs until 2.0, with the
expectation that there will be several 1.x releases between. We can add
other things to 1.0, which might push it out further, but I don't see
how that helps things much.
Would it be materially better for you if we waited longer before calling
a release 1.0, assuming that the same features are released in the same
order and on the same schedule regardless of the release name?
Doug
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Allen Wittenauer <aw...@linkedin.com>.
On 9/25/09 1:18 PM, "Doug Cutting" <cu...@apache.org> wrote:
> The question is not whether wire compatibility is a good thing. The
> question is whether API compatibility is useless without wire
> compatibility and, vice versa, whether wire compatibility is useless
> without API compatibility. They're both valuable features and we should
> get both of them out as soon as feasible.
>
> The question is if one slips whether we should we hold the other. I
> don't think we should. Hence we should not in advance tie a particular
> release name to both features. That's all I'm saying. I claim that the
> 1.0 moniker is most strongly tied to API compatibility. If we can get
> wire and other sorts of valuable compatibility into that same release,
> then great. If one comes out earlier or later they're both still
> valuable. But neither needs to block the other.
Oh, I completely understand. I'm just throwing in a non-developer's
opinion... because I'm sure I'm not the only one expecting/assuming that 1.0
== completely stable.
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Doug Cutting <cu...@apache.org>.
Allen Wittenauer wrote:
> This is just so disappointing and, quite frankly, makes 1.0 less than useful
> for Real Work. Great, the APIs don't change but you still have the same
> problems of getting data on/off the grid without upgrading your clients
> every time.
>
> To me, without wire compatibility, 1.0 makes me feel pretty "meh; who
> cares--we're still going to be in upgrade hell".
The question is not whether wire compatibility is a good thing. The
question is whether API compatibility is useless without wire
compatibility and, vice versa, whether wire compatibility is useless
without API compatibility. They're both valuable features and we should
get both of them out as soon as feasible.
The question is if one slips whether we should we hold the other. I
don't think we should. Hence we should not in advance tie a particular
release name to both features. That's all I'm saying. I claim that the
1.0 moniker is most strongly tied to API compatibility. If we can get
wire and other sorts of valuable compatibility into that same release,
then great. If one comes out earlier or later they're both still
valuable. But neither needs to block the other.
Doug
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Allen Wittenauer <aw...@linkedin.com>.
On 9/25/09 12:44 PM, "Sanjay Radia" <sr...@yahoo-inc.com> wrote:
>
> On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:
>
>> On 9/25/09 10:13 AM, "Dhruba Borthakur" <dh...@gmail.com> wrote:
>>> It is really nice to have wire-compatibility between clients and
>> servers
>>> running different versions of hadoop. The reason we would like
>> this is
>>> because we can allow the same client (Hive, etc) submit jobs to two
>>> different clusters running different versions of hadoop. But I am
>> not stuck
>>> up on the name of the release that supports wire-compatibility, it
>> can be
>>> either 1.0 or something later than that.
>>
>> To me, the lack of wire compatibility makes will make "Hadoop 1.0"
>> in name
>> only when in reality it is more like 0.80. :(
>
> My sentiments exactly, though I could learn to live with it ....
We just had this discussion today about how to put Hadoop into a production
pipeline. I was under the impression that 1.0 was going to be wire
compatible too.
This is just so disappointing and, quite frankly, makes 1.0 less than useful
for Real Work. Great, the APIs don't change but you still have the same
problems of getting data on/off the grid without upgrading your clients
every time.
To me, without wire compatibility, 1.0 makes me feel pretty "meh; who
cares--we're still going to be in upgrade hell".
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:
> On 9/25/09 10:13 AM, "Dhruba Borthakur" <dh...@gmail.com> wrote:
> > It is really nice to have wire-compatibility between clients and
> servers
> > running different versions of hadoop. The reason we would like
> this is
> > because we can allow the same client (Hive, etc) submit jobs to two
> > different clusters running different versions of hadoop. But I am
> not stuck
> > up on the name of the release that supports wire-compatibility, it
> can be
> > either 1.0 or something later than that.
>
> To me, the lack of wire compatibility makes will make "Hadoop 1.0"
> in name
> only when in reality it is more like 0.80. :(
My sentiments exactly, though I could learn to live with it ....
>
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Allen Wittenauer <aw...@linkedin.com>.
On 9/25/09 10:13 AM, "Dhruba Borthakur" <dh...@gmail.com> wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0 or something later than that.
To me, the lack of wire compatibility makes will make "Hadoop 1.0" in name
only when in reality it is more like 0.80. :(
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Andrew Purtell <ap...@apache.org>.
HBase and similar HDFS clients could benefit from a (high) performant
stable datacenter network protocol that is built into the namenode and
datanodes. Then we could decouple from Hadoop versioning and release
cycle. HDFS could decouple from core, etc.
Whatever stable network protocol is devised, if any, of course should
perform as well if not better than the current one. A stable but lower
performing option, unfortunately, would be excluded from consideration
right away.
HBase is a bit of a special case currently perhaps in that its access
pattern is random read/write and it may be only a handful of clients
like that. However if HDFS is positioned as a product in its own right,
which I believe is the case since the split, there may be many other
potential users of it -- for all of its benefits -- given a stable
wire format that enables decoupled development.
API compatibility +1
Data compatibility +1
Wire compatibility +1
Best regards,
Andrew Purtell
Committing Member, HBase Project: hbase.org
________________________________
From: Steve Loughran <st...@apache.org>
To: common-dev@hadoop.apache.org
Sent: Monday, September 28, 2009 3:15:09 AM
Subject: Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Dhruba Borthakur wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0 or something later than that.
> API compatibility +1
> Data compatibility +1
> Job Q compatibility -1Wire compatibility +0
That's stability of the job submission network protocol you are looking for there.
* We need a job submission API that is designed to work over long-haul links and versions
* It does not have to be the same as anything used in-cluster
* It does not actually need to run in the JobTracker. An independent service bridging the stable long-haul API to an unstable datacentre protocol does work, though authentication and user-rights are a troublespot
Similarly, it would be good for a stable long-haul HDFS protocol, such as FTP or webdav. Again, no need to build into the namenode .
see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Dhruba Borthakur <dh...@gmail.com>.
I think we should not require Job Q compatibility for 1.0 release.
thanks,
dhruba
On Mon, Sep 28, 2009 at 11:06 AM, Sanjay Radia <sr...@yahoo-inc.com> wrote:
>
> On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:
>
> Dhruba Borthakur wrote:
>> > It is really nice to have wire-compatibility between clients and servers
>> > running different versions of hadoop. The reason we would like this is
>> > because we can allow the same client (Hive, etc) submit jobs to two
>> > different clusters running different versions of hadoop. But I am not
>> stuck
>> > up on the name of the release that supports wire-compatibility, it can
>> be
>> > either 1.0 or something later than that.
>> > API compatibility +1
>> > Data compatibility +1
>> > Job Q compatibility -1Wire compatibility +0
>>
>>
>> That's stability of the job submission network protocol you are looking
>> for there.
>> * We need a job submission API that is designed to work over long-haul
>> links and versions
>> * It does not have to be the same as anything used in-cluster
>> * It does not actually need to run in the JobTracker. An independent
>> service bridging the stable long-haul API to an unstable datacentre
>> protocol does work, though authentication and user-rights are a
>> troublespot
>>
>>
>
>
> I think you are misinterpreting what Job Q compatibility means.
> It is about jobs already in the queue surviving an upgrade across a
> release.
>
> See my initial proposal on Jan 16th:
>
> https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
> #action_12664691
>
> Doug argued that it is nice to have but not required for 1.0 - can be added
> later.
>
>
> sanjay
>
>
>> Similarly, it would be good for a stable long-haul HDFS protocol, such
>> as FTP or webdav. Again, no need to build into the namenode .
>>
>> see http://www.slideshare.net/steve_l/long-haul-hadoop
>> and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop
>>
>>
>
--
Connect to me at http://www.facebook.com/dhruba
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:
> Dhruba Borthakur wrote:
> > It is really nice to have wire-compatibility between clients and
> servers
> > running different versions of hadoop. The reason we would like
> this is
> > because we can allow the same client (Hive, etc) submit jobs to two
> > different clusters running different versions of hadoop. But I am
> not stuck
> > up on the name of the release that supports wire-compatibility, it
> can be
> > either 1.0 or something later than that.
> > API compatibility +1
> > Data compatibility +1
> > Job Q compatibility -1Wire compatibility +0
>
>
> That's stability of the job submission network protocol you are
> looking
> for there.
> * We need a job submission API that is designed to work over long-
> haul
> links and versions
> * It does not have to be the same as anything used in-cluster
> * It does not actually need to run in the JobTracker. An independent
> service bridging the stable long-haul API to an unstable datacentre
> protocol does work, though authentication and user-rights are a
> troublespot
>
I think you are misinterpreting what Job Q compatibility means.
It is about jobs already in the queue surviving an upgrade across a
release.
See my initial proposal on Jan 16th:
https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
#action_12664691
Doug argued that it is nice to have but not required for 1.0 - can be
added later.
sanjay
>
> Similarly, it would be good for a stable long-haul HDFS protocol, such
> as FTP or webdav. Again, no need to build into the namenode .
>
> see http://www.slideshare.net/steve_l/long-haul-hadoop
> and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop
>
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Posted by Steve Loughran <st...@apache.org>.
Dhruba Borthakur wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0 or something later than that.
> API compatibility +1
> Data compatibility +1
> Job Q compatibility -1Wire compatibility +0
That's stability of the job submission network protocol you are looking
for there.
* We need a job submission API that is designed to work over long-haul
links and versions
* It does not have to be the same as anything used in-cluster
* It does not actually need to run in the JobTracker. An independent
service bridging the stable long-haul API to an unstable datacentre
protocol does work, though authentication and user-rights are a troublespot
Similarly, it would be good for a stable long-haul HDFS protocol, such
as FTP or webdav. Again, no need to build into the namenode .
see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop