You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Roman Shaposhnik <rv...@apache.org> on 2011/09/24 02:34:43 UTC

Re: Rejuvenate Hadoop 0.22 effort

On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> == TESTING ==
> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> great. Will be glad to see any steps in this direction.

The basic integration is done. We can produce fully functional RPM
and DEB packages for Hadoop 0.22 release.

This is good news. The bad news is that very few downstream components
can be compiled against .22. And I'm not talking changes to versions, pom.xml
and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
all need to be modified to support .22. Before that's done -- there's
little that
can be done as far as stack validation is concerned.

Given that work needs to be done in downstream components, I've got 2 questions:
   1. do we know if the API delta between .22 and .23 is as
significant as betwen
   .22 and .20.2?

   2. what's the common approach downstream to support multiple versions of
   Hadoop APIs? Or is this even something that can be asked of all the
components?

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.
>> About connecting with other projects.
>> HBase is compiling with 0.22.
>
> That is trunk of HBase right? IOW, we don't really have a released version
> that is compatible with .22?

I mean HBase  0.92, which was branched recently.


>> For Pig there is
>> https://issues.apache.org/jira/browse/PIG-2277
>> For Hive created
>> https://issues.apache.org/jira/browse/HIVE-2468
>>
>> The direction with Hive and Pig is to create shim layers for different versions.
>
> IOW, a single build of Hive and Pig being able to communicate
> with different versions of Hadoop?
>
> This is fine, but sound more time consuming than what HBase is
> doing (providing profiles to build against different versions of Hadoop).
>
> Regardless of how time consuming either approach is, I guess my
> fundamental question would be -- do we have any kind of commitment
> from the downstream guys to have a release compatible with .22?
>
> I guess I'm just wondering how these timelines of downstream
> components will affect usability of any Hadoop release (be it .22 or .23).
> Any thoughts on that?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.
>> About connecting with other projects.
>> HBase is compiling with 0.22.
>
> That is trunk of HBase right? IOW, we don't really have a released version
> that is compatible with .22?

I mean HBase  0.92, which was branched recently.


>> For Pig there is
>> https://issues.apache.org/jira/browse/PIG-2277
>> For Hive created
>> https://issues.apache.org/jira/browse/HIVE-2468
>>
>> The direction with Hive and Pig is to create shim layers for different versions.
>
> IOW, a single build of Hive and Pig being able to communicate
> with different versions of Hadoop?
>
> This is fine, but sound more time consuming than what HBase is
> doing (providing profiles to build against different versions of Hadoop).
>
> Regardless of how time consuming either approach is, I guess my
> fundamental question would be -- do we have any kind of commitment
> from the downstream guys to have a release compatible with .22?
>
> I guess I'm just wondering how these timelines of downstream
> components will affect usability of any Hadoop release (be it .22 or .23).
> Any thoughts on that?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.
On Sun, Sep 25, 2011 at 6:02 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Good news Roman!
>
> About connecting with other projects.
> HBase is compiling with 0.22.

That is trunk of HBase right? IOW, we don't really have a released version
that is compatible with .22?

> For Pig there is
> https://issues.apache.org/jira/browse/PIG-2277
> For Hive created
> https://issues.apache.org/jira/browse/HIVE-2468
>
> The direction with Hive and Pig is to create shim layers for different versions.

IOW, a single build of Hive and Pig being able to communicate
with different versions of Hadoop?

This is fine, but sound more time consuming than what HBase is
doing (providing profiles to build against different versions of Hadoop).

Regardless of how time consuming either approach is, I guess my
fundamental question would be -- do we have any kind of commitment
from the downstream guys to have a release compatible with .22?

I guess I'm just wondering how these timelines of downstream
components will affect usability of any Hadoop release (be it .22 or .23).
Any thoughts on that?

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.
On Sun, Sep 25, 2011 at 6:02 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Good news Roman!
>
> About connecting with other projects.
> HBase is compiling with 0.22.

That is trunk of HBase right? IOW, we don't really have a released version
that is compatible with .22?

> For Pig there is
> https://issues.apache.org/jira/browse/PIG-2277
> For Hive created
> https://issues.apache.org/jira/browse/HIVE-2468
>
> The direction with Hive and Pig is to create shim layers for different versions.

IOW, a single build of Hive and Pig being able to communicate
with different versions of Hadoop?

This is fine, but sound more time consuming than what HBase is
doing (providing profiles to build against different versions of Hadoop).

Regardless of how time consuming either approach is, I guess my
fundamental question would be -- do we have any kind of commitment
from the downstream guys to have a release compatible with .22?

I guess I'm just wondering how these timelines of downstream
components will affect usability of any Hadoop release (be it .22 or .23).
Any thoughts on that?

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.
Good news Roman!

About connecting with other projects.
HBase is compiling with 0.22.
For Pig there is
https://issues.apache.org/jira/browse/PIG-2277
For Hive created
https://issues.apache.org/jira/browse/HIVE-2468

The direction with Hive and Pig is to create shim layers for different versions.

Don't know about the API delta between .22 and .23 yet. I assume it is
less than 0.20 vs 0.22. But I may be wrong.

--Konstantin

On Fri, Sep 23, 2011 at 5:34 PM, Roman Shaposhnik <rv...@apache.org> wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
>> == TESTING ==
>> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
>> great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
>   1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>   .22 and .20.2?
>
>   2. what's the common approach downstream to support multiple versions of
>   Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?

Excellent point! In fact, let me hook these jobs to Bigtop's Jenkis so that this
info gets to be seen by anybody who wants to. I'll take care of it tomorrow.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?

Excellent point! In fact, let me hook these jobs to Bigtop's Jenkis so that this
info gets to be seen by anybody who wants to. I'll take care of it tomorrow.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
>
> Do you have any build links or some such to point to?

Sorry. Took me a bit longer to hook up everything to our jenkins. Here's the
URL for the matrix job that is trying to compile everything in Bigtop against
Hadoop 0.22:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/

I disabled HBase for now, since it is compiling perfect and I don't want to
waste time doing it.

Otherwise -- it would extremely nice if mapreduce folks can suggest patches
to make these things compile. Things like these:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=sqoop,label=centos5/2/console
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=mahout,label=centos5/2/console

Must be pretty trivial.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
>
> Do you have any build links or some such to point to?

Sorry. Took me a bit longer to hook up everything to our jenkins. Here's the
URL for the matrix job that is trying to compile everything in Bigtop against
Hadoop 0.22:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/

I disabled HBase for now, since it is compiling perfect and I don't want to
waste time doing it.

Otherwise -- it would extremely nice if mapreduce folks can suggest patches
to make these things compile. Things like these:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=sqoop,label=centos5/2/console
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=mahout,label=centos5/2/console

Must be pretty trivial.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Boudnik <co...@apache.org>.
I'd say let's take a look at how bad are the problems; what are discrepancies?

Do you have any build links or some such to point to?
  Cos
   
On Fri, Sep 23, 2011 at 05:34PM, Roman Shaposhnik wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
> > == TESTING ==
> > 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> > great. Will be glad to see any steps in this direction.
> 
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
> 
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
> 
> Given that work needs to be done in downstream components, I've got 2 questions:
>    1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>    .22 and .20.2?
> 
>    2. what's the common approach downstream to support multiple versions of
>    Hadoop APIs? Or is this even something that can be asked of all the
> components?
> 
> Thanks,
> Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.
Good news Roman!

About connecting with other projects.
HBase is compiling with 0.22.
For Pig there is
https://issues.apache.org/jira/browse/PIG-2277
For Hive created
https://issues.apache.org/jira/browse/HIVE-2468

The direction with Hive and Pig is to create shim layers for different versions.

Don't know about the API delta between .22 and .23 yet. I assume it is
less than 0.20 vs 0.22. But I may be wrong.

--Konstantin

On Fri, Sep 23, 2011 at 5:34 PM, Roman Shaposhnik <rv...@apache.org> wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
>> == TESTING ==
>> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
>> great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
>   1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>   .22 and .20.2?
>
>   2. what's the common approach downstream to support multiple versions of
>   Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Boudnik <co...@apache.org>.
I'd say let's take a look at how bad are the problems; what are discrepancies?

Do you have any build links or some such to point to?
  Cos
   
On Fri, Sep 23, 2011 at 05:34PM, Roman Shaposhnik wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
> > == TESTING ==
> > 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> > great. Will be glad to see any steps in this direction.
> 
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
> 
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
> 
> Given that work needs to be done in downstream components, I've got 2 questions:
>    1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>    .22 and .20.2?
> 
>    2. what's the common approach downstream to support multiple versions of
>    Hadoop APIs? Or is this even something that can be asked of all the
> components?
> 
> Thanks,
> Roman.