You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Konstantin Shvachko <sh...@gmail.com> on 2011/09/16 11:21:03 UTC

Rejuvenate Hadoop 0.22 effort

Hi everybody,

I think there is no need to change anything drastically with the plans
for Hadoop 0.22 release, so I'll continue  along the lines previously
rendered by Nigel, discussed, and agreed upon within the community.

1. First thing, we need to resurrect Hadoop-0.22 Jenkins builds ASAP.
Does anybody want to help with that? Any help is greatly appreciated.

2. I will start sorting out jiras currently assigned to 0.22. There
are 10 blockers (again) over the three projects. Details below. My
plan is to get the release candidate out late October.

3. Let's start discussing what else people think needs to be included
in 0.22. I will include issues based on the following priorities
- build fixes,
- test failures,
- bug fixes,
- documentation
- compatibility issues directed to making H-0.22 work with other
project (HBase, Pig, Hive, Oozie)
- minor improvements (irritating for users but simple)
- no new features in 0.22.0, but I'd like to have a list of things
which people would've considered for 0.22.1

4. I will use the following filter to watch the jiras assigned to the release:
project in (HADOOP, HDFS, MAPREDUCE) AND resolution = Unresolved AND
fixVersion = "0.22.0" ORDER BY priority DESC
If you think an issue should be considered for inclusion please set
fixVersion = "0.22.0". I will mark them as blockers based on the
priorities above and my common sense.
Note, if the jira is consciously assigned to a contributor it has high
chance to make into the blockers.

== TESTING ==
5. I think Steve's idea of integrating 0.22 with Apache BigTop is
great. Will be glad to see any steps in this direction.

6. Hadoop-0.22 is being tested since January 2011. We conducted some
internal testing lately. Testing is proceeding now on a dev cluster.
If anybody plans to setup a cluster for testing and wants to
coordinate the efforts please ping me.

== 10 BLOCKERS ==
7. There are 10 official blockers.

Key				Assignee		Summary
MAPREDUCE-1991	Todd Lipcon	taskcontroller allows stealing permissions
on any local file
MAPREDUCE-2178	Devaraj Das	Race condition in LinuxTaskController
permissions handling
MAPREDUCE-2266	Unassigned	JvmManager sleeps between SIGTERM and
SIGKILL while holding many TT locks

	I will unblock TaskController issues as per discussion related to
MAPREDUCE-2767.

MAPREDUCE-1100	Vinod Kumar	User's task-logs filling up local disks on
the TaskTrackers
MAPREDUCE-1716	Vinod Kumar	MAPREDUCE-1100 Truncate logs of finished
tasks to prevent node thrash due to excessive logging

	Don't see any activity from Vinod. Any volunteers to port this to 0.22?

HADOOP-7035		Tom White	Document incompatible API changes between releases

	Looks like close to completion. Tom are you still on it?

MAPREDUCE-2268	Todd Lipcon	With JVM reuse, JvmManager doesn't delete
last workdir properly

	Todd, is it a blocker? Do you plan to fix it soon?

MAPREDUCE-1506	Unassigned	Assertion failure in TestTaskTrackerMemoryManager

	Will unblock, as no volunteers emerged.

HDFS-1967		Unassigned	HDFS-1852 TestHDFSTrash failing on trunk and 22
HDFS-2012		Unassigned	Recurring failure of TestBalancer on branch-0.22

	Don't see failures anymore. Will follow up when Jenkins builds are restored

HDFS-2290		Benoy Antony	Block with corrupt replica is not getting replicated

	Close to completion.

Thanks,
--Konstantin

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.

>> About connecting with other projects.
>> HBase is compiling with 0.22.
>
> That is trunk of HBase right? IOW, we don't really have a released version
> that is compatible with .22?

I mean HBase  0.92, which was branched recently.


>> For Pig there is
>> https://issues.apache.org/jira/browse/PIG-2277
>> For Hive created
>> https://issues.apache.org/jira/browse/HIVE-2468
>>
>> The direction with Hive and Pig is to create shim layers for different versions.
>
> IOW, a single build of Hive and Pig being able to communicate
> with different versions of Hadoop?
>
> This is fine, but sound more time consuming than what HBase is
> doing (providing profiles to build against different versions of Hadoop).
>
> Regardless of how time consuming either approach is, I guess my
> fundamental question would be -- do we have any kind of commitment
> from the downstream guys to have a release compatible with .22?
>
> I guess I'm just wondering how these timelines of downstream
> components will affect usability of any Hadoop release (be it .22 or .23).
> Any thoughts on that?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.

>> About connecting with other projects.
>> HBase is compiling with 0.22.
>
> That is trunk of HBase right? IOW, we don't really have a released version
> that is compatible with .22?

I mean HBase  0.92, which was branched recently.


>> For Pig there is
>> https://issues.apache.org/jira/browse/PIG-2277
>> For Hive created
>> https://issues.apache.org/jira/browse/HIVE-2468
>>
>> The direction with Hive and Pig is to create shim layers for different versions.
>
> IOW, a single build of Hive and Pig being able to communicate
> with different versions of Hadoop?
>
> This is fine, but sound more time consuming than what HBase is
> doing (providing profiles to build against different versions of Hadoop).
>
> Regardless of how time consuming either approach is, I guess my
> fundamental question would be -- do we have any kind of commitment
> from the downstream guys to have a release compatible with .22?
>
> I guess I'm just wondering how these timelines of downstream
> components will affect usability of any Hadoop release (be it .22 or .23).
> Any thoughts on that?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Sun, Sep 25, 2011 at 6:02 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Good news Roman!
>
> About connecting with other projects.
> HBase is compiling with 0.22.

That is trunk of HBase right? IOW, we don't really have a released version
that is compatible with .22?

> For Pig there is
> https://issues.apache.org/jira/browse/PIG-2277
> For Hive created
> https://issues.apache.org/jira/browse/HIVE-2468
>
> The direction with Hive and Pig is to create shim layers for different versions.

IOW, a single build of Hive and Pig being able to communicate
with different versions of Hadoop?

This is fine, but sound more time consuming than what HBase is
doing (providing profiles to build against different versions of Hadoop).

Regardless of how time consuming either approach is, I guess my
fundamental question would be -- do we have any kind of commitment
from the downstream guys to have a release compatible with .22?

I guess I'm just wondering how these timelines of downstream
components will affect usability of any Hadoop release (be it .22 or .23).
Any thoughts on that?

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Sun, Sep 25, 2011 at 6:02 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Good news Roman!
>
> About connecting with other projects.
> HBase is compiling with 0.22.

That is trunk of HBase right? IOW, we don't really have a released version
that is compatible with .22?

> For Pig there is
> https://issues.apache.org/jira/browse/PIG-2277
> For Hive created
> https://issues.apache.org/jira/browse/HIVE-2468
>
> The direction with Hive and Pig is to create shim layers for different versions.

IOW, a single build of Hive and Pig being able to communicate
with different versions of Hadoop?

This is fine, but sound more time consuming than what HBase is
doing (providing profiles to build against different versions of Hadoop).

Regardless of how time consuming either approach is, I guess my
fundamental question would be -- do we have any kind of commitment
from the downstream guys to have a release compatible with .22?

I guess I'm just wondering how these timelines of downstream
components will affect usability of any Hadoop release (be it .22 or .23).
Any thoughts on that?

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.

Good news Roman!

About connecting with other projects.
HBase is compiling with 0.22.
For Pig there is
https://issues.apache.org/jira/browse/PIG-2277
For Hive created
https://issues.apache.org/jira/browse/HIVE-2468

The direction with Hive and Pig is to create shim layers for different versions.

Don't know about the API delta between .22 and .23 yet. I assume it is
less than 0.20 vs 0.22. But I may be wrong.

--Konstantin

On Fri, Sep 23, 2011 at 5:34 PM, Roman Shaposhnik <rv...@apache.org> wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
>> == TESTING ==
>> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
>> great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
>   1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>   .22 and .20.2?
>
>   2. what's the common approach downstream to support multiple versions of
>   Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?

Excellent point! In fact, let me hook these jobs to Bigtop's Jenkis so that this
info gets to be seen by anybody who wants to. I'll take care of it tomorrow.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?

Excellent point! In fact, let me hook these jobs to Bigtop's Jenkis so that this
info gets to be seen by anybody who wants to. I'll take care of it tomorrow.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
>
> Do you have any build links or some such to point to?

Sorry. Took me a bit longer to hook up everything to our jenkins. Here's the
URL for the matrix job that is trying to compile everything in Bigtop against
Hadoop 0.22:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/

I disabled HBase for now, since it is compiling perfect and I don't want to
waste time doing it.

Otherwise -- it would extremely nice if mapreduce folks can suggest patches
to make these things compile. Things like these:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=sqoop,label=centos5/2/console
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=mahout,label=centos5/2/console

Must be pretty trivial.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
>
> Do you have any build links or some such to point to?

Sorry. Took me a bit longer to hook up everything to our jenkins. Here's the
URL for the matrix job that is trying to compile everything in Bigtop against
Hadoop 0.22:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/

I disabled HBase for now, since it is compiling perfect and I don't want to
waste time doing it.

Otherwise -- it would extremely nice if mapreduce folks can suggest patches
to make these things compile. Things like these:
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=sqoop,label=centos5/2/console
   http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=mahout,label=centos5/2/console

Must be pretty trivial.

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Boudnik <co...@apache.org>.

I'd say let's take a look at how bad are the problems; what are discrepancies?

Do you have any build links or some such to point to?
  Cos
   
On Fri, Sep 23, 2011 at 05:34PM, Roman Shaposhnik wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
> > == TESTING ==
> > 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> > great. Will be glad to see any steps in this direction.
> 
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
> 
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
> 
> Given that work needs to be done in downstream components, I've got 2 questions:
>    1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>    .22 and .20.2?
> 
>    2. what's the common approach downstream to support multiple versions of
>    Hadoop APIs? Or is this even something that can be asked of all the
> components?
> 
> Thanks,
> Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.

Good news Roman!

About connecting with other projects.
HBase is compiling with 0.22.
For Pig there is
https://issues.apache.org/jira/browse/PIG-2277
For Hive created
https://issues.apache.org/jira/browse/HIVE-2468

The direction with Hive and Pig is to create shim layers for different versions.

Don't know about the API delta between .22 and .23 yet. I assume it is
less than 0.20 vs 0.22. But I may be wrong.

--Konstantin

On Fri, Sep 23, 2011 at 5:34 PM, Roman Shaposhnik <rv...@apache.org> wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
>> == TESTING ==
>> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
>> great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
>   1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>   .22 and .20.2?
>
>   2. what's the common approach downstream to support multiple versions of
>   Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Boudnik <co...@apache.org>.

I'd say let's take a look at how bad are the problems; what are discrepancies?

Do you have any build links or some such to point to?
  Cos
   
On Fri, Sep 23, 2011 at 05:34PM, Roman Shaposhnik wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
> > == TESTING ==
> > 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> > great. Will be glad to see any steps in this direction.
> 
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
> 
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
> 
> Given that work needs to be done in downstream components, I've got 2 questions:
>    1. do we know if the API delta between .22 and .23 is as
> significant as betwen
>    .22 and .20.2?
> 
>    2. what's the common approach downstream to support multiple versions of
>    Hadoop APIs? Or is this even something that can be asked of all the
> components?
> 
> Thanks,
> Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> == TESTING ==
> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> great. Will be glad to see any steps in this direction.

The basic integration is done. We can produce fully functional RPM
and DEB packages for Hadoop 0.22 release.

This is good news. The bad news is that very few downstream components
can be compiled against .22. And I'm not talking changes to versions, pom.xml
and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
all need to be modified to support .22. Before that's done -- there's
little that
can be done as far as stack validation is concerned.

Given that work needs to be done in downstream components, I've got 2 questions:
   1. do we know if the API delta between .22 and .23 is as
significant as betwen
   .22 and .20.2?

   2. what's the common approach downstream to support multiple versions of
   Hadoop APIs? Or is this even something that can be asked of all the
components?

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> == TESTING ==
> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> great. Will be glad to see any steps in this direction.

The basic integration is done. We can produce fully functional RPM
and DEB packages for Hadoop 0.22 release.

This is good news. The bad news is that very few downstream components
can be compiled against .22. And I'm not talking changes to versions, pom.xml
and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
all need to be modified to support .22. Before that's done -- there's
little that
can be done as far as stack validation is concerned.

Given that work needs to be done in downstream components, I've got 2 questions:
   1. do we know if the API delta between .22 and .23 is as
significant as betwen
   .22 and .20.2?

   2. what's the common approach downstream to support multiple versions of
   Hadoop APIs? Or is this even something that can be asked of all the
components?

Thanks,
Roman.

Re: Rejuvenate Hadoop 0.22 effort

Posted by Konstantin Shvachko <sh...@gmail.com>.

Thanks Owen. Will definitely look at those.
--Konstantin

On Fri, Sep 16, 2011 at 9:10 AM, Owen O'Malley <ow...@hortonworks.com> wrote:
> Konst,
>   You should take a look at the evaluate, not just the things that
> were marked as blockers 6 months ago, but also look at the things that
> have gone out in the 0.20.2xx line that aren't in 0.22.
>
> Areas of work that leap to mind:
>  1. fixes to the linux task controller.
>  2. rpm work
>  3. mr scheduler limits
>  4. capacity and fair share improvements
>  5. har improvements
>
> -- Owen
>

Re: Rejuvenate Hadoop 0.22 effort

Posted by Owen O'Malley <ow...@hortonworks.com>.

Konst,
   You should take a look at the evaluate, not just the things that
were marked as blockers 6 months ago, but also look at the things that
have gone out in the 0.20.2xx line that aren't in 0.22.

Areas of work that leap to mind:
  1. fixes to the linux task controller.
  2. rpm work
  3. mr scheduler limits
  4. capacity and fair share improvements
  5. har improvements

-- Owen