You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Konstantin Shvachko <sh...@gmail.com> on 2011/09/16 11:21:03 UTC
Rejuvenate Hadoop 0.22 effort
Hi everybody,
I think there is no need to change anything drastically with the plans
for Hadoop 0.22 release, so I'll continue along the lines previously
rendered by Nigel, discussed, and agreed upon within the community.
1. First thing, we need to resurrect Hadoop-0.22 Jenkins builds ASAP.
Does anybody want to help with that? Any help is greatly appreciated.
2. I will start sorting out jiras currently assigned to 0.22. There
are 10 blockers (again) over the three projects. Details below. My
plan is to get the release candidate out late October.
3. Let's start discussing what else people think needs to be included
in 0.22. I will include issues based on the following priorities
- build fixes,
- test failures,
- bug fixes,
- documentation
- compatibility issues directed to making H-0.22 work with other
project (HBase, Pig, Hive, Oozie)
- minor improvements (irritating for users but simple)
- no new features in 0.22.0, but I'd like to have a list of things
which people would've considered for 0.22.1
4. I will use the following filter to watch the jiras assigned to the release:
project in (HADOOP, HDFS, MAPREDUCE) AND resolution = Unresolved AND
fixVersion = "0.22.0" ORDER BY priority DESC
If you think an issue should be considered for inclusion please set
fixVersion = "0.22.0". I will mark them as blockers based on the
priorities above and my common sense.
Note, if the jira is consciously assigned to a contributor it has high
chance to make into the blockers.
== TESTING ==
5. I think Steve's idea of integrating 0.22 with Apache BigTop is
great. Will be glad to see any steps in this direction.
6. Hadoop-0.22 is being tested since January 2011. We conducted some
internal testing lately. Testing is proceeding now on a dev cluster.
If anybody plans to setup a cluster for testing and wants to
coordinate the efforts please ping me.
== 10 BLOCKERS ==
7. There are 10 official blockers.
Key Assignee Summary
MAPREDUCE-1991 Todd Lipcon taskcontroller allows stealing permissions
on any local file
MAPREDUCE-2178 Devaraj Das Race condition in LinuxTaskController
permissions handling
MAPREDUCE-2266 Unassigned JvmManager sleeps between SIGTERM and
SIGKILL while holding many TT locks
I will unblock TaskController issues as per discussion related to
MAPREDUCE-2767.
MAPREDUCE-1100 Vinod Kumar User's task-logs filling up local disks on
the TaskTrackers
MAPREDUCE-1716 Vinod Kumar MAPREDUCE-1100 Truncate logs of finished
tasks to prevent node thrash due to excessive logging
Don't see any activity from Vinod. Any volunteers to port this to 0.22?
HADOOP-7035 Tom White Document incompatible API changes between releases
Looks like close to completion. Tom are you still on it?
MAPREDUCE-2268 Todd Lipcon With JVM reuse, JvmManager doesn't delete
last workdir properly
Todd, is it a blocker? Do you plan to fix it soon?
MAPREDUCE-1506 Unassigned Assertion failure in TestTaskTrackerMemoryManager
Will unblock, as no volunteers emerged.
HDFS-1967 Unassigned HDFS-1852 TestHDFSTrash failing on trunk and 22
HDFS-2012 Unassigned Recurring failure of TestBalancer on branch-0.22
Don't see failures anymore. Will follow up when Jenkins builds are restored
HDFS-2290 Benoy Antony Block with corrupt replica is not getting replicated
Close to completion.
Thanks,
--Konstantin
Re: Rejuvenate Hadoop 0.22 effort
Posted by Konstantin Shvachko <sh...@gmail.com>.
>> About connecting with other projects.
>> HBase is compiling with 0.22.
>
> That is trunk of HBase right? IOW, we don't really have a released version
> that is compatible with .22?
I mean HBase 0.92, which was branched recently.
>> For Pig there is
>> https://issues.apache.org/jira/browse/PIG-2277
>> For Hive created
>> https://issues.apache.org/jira/browse/HIVE-2468
>>
>> The direction with Hive and Pig is to create shim layers for different versions.
>
> IOW, a single build of Hive and Pig being able to communicate
> with different versions of Hadoop?
>
> This is fine, but sound more time consuming than what HBase is
> doing (providing profiles to build against different versions of Hadoop).
>
> Regardless of how time consuming either approach is, I guess my
> fundamental question would be -- do we have any kind of commitment
> from the downstream guys to have a release compatible with .22?
>
> I guess I'm just wondering how these timelines of downstream
> components will affect usability of any Hadoop release (be it .22 or .23).
> Any thoughts on that?
>
> Thanks,
> Roman.
>
Re: Rejuvenate Hadoop 0.22 effort
Posted by Konstantin Shvachko <sh...@gmail.com>.
>> About connecting with other projects.
>> HBase is compiling with 0.22.
>
> That is trunk of HBase right? IOW, we don't really have a released version
> that is compatible with .22?
I mean HBase 0.92, which was branched recently.
>> For Pig there is
>> https://issues.apache.org/jira/browse/PIG-2277
>> For Hive created
>> https://issues.apache.org/jira/browse/HIVE-2468
>>
>> The direction with Hive and Pig is to create shim layers for different versions.
>
> IOW, a single build of Hive and Pig being able to communicate
> with different versions of Hadoop?
>
> This is fine, but sound more time consuming than what HBase is
> doing (providing profiles to build against different versions of Hadoop).
>
> Regardless of how time consuming either approach is, I guess my
> fundamental question would be -- do we have any kind of commitment
> from the downstream guys to have a release compatible with .22?
>
> I guess I'm just wondering how these timelines of downstream
> components will affect usability of any Hadoop release (be it .22 or .23).
> Any thoughts on that?
>
> Thanks,
> Roman.
>
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Sun, Sep 25, 2011 at 6:02 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Good news Roman!
>
> About connecting with other projects.
> HBase is compiling with 0.22.
That is trunk of HBase right? IOW, we don't really have a released version
that is compatible with .22?
> For Pig there is
> https://issues.apache.org/jira/browse/PIG-2277
> For Hive created
> https://issues.apache.org/jira/browse/HIVE-2468
>
> The direction with Hive and Pig is to create shim layers for different versions.
IOW, a single build of Hive and Pig being able to communicate
with different versions of Hadoop?
This is fine, but sound more time consuming than what HBase is
doing (providing profiles to build against different versions of Hadoop).
Regardless of how time consuming either approach is, I guess my
fundamental question would be -- do we have any kind of commitment
from the downstream guys to have a release compatible with .22?
I guess I'm just wondering how these timelines of downstream
components will affect usability of any Hadoop release (be it .22 or .23).
Any thoughts on that?
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Sun, Sep 25, 2011 at 6:02 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Good news Roman!
>
> About connecting with other projects.
> HBase is compiling with 0.22.
That is trunk of HBase right? IOW, we don't really have a released version
that is compatible with .22?
> For Pig there is
> https://issues.apache.org/jira/browse/PIG-2277
> For Hive created
> https://issues.apache.org/jira/browse/HIVE-2468
>
> The direction with Hive and Pig is to create shim layers for different versions.
IOW, a single build of Hive and Pig being able to communicate
with different versions of Hadoop?
This is fine, but sound more time consuming than what HBase is
doing (providing profiles to build against different versions of Hadoop).
Regardless of how time consuming either approach is, I guess my
fundamental question would be -- do we have any kind of commitment
from the downstream guys to have a release compatible with .22?
I guess I'm just wondering how these timelines of downstream
components will affect usability of any Hadoop release (be it .22 or .23).
Any thoughts on that?
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Konstantin Shvachko <sh...@gmail.com>.
Good news Roman!
About connecting with other projects.
HBase is compiling with 0.22.
For Pig there is
https://issues.apache.org/jira/browse/PIG-2277
For Hive created
https://issues.apache.org/jira/browse/HIVE-2468
The direction with Hive and Pig is to create shim layers for different versions.
Don't know about the API delta between .22 and .23 yet. I assume it is
less than 0.20 vs 0.22. But I may be wrong.
--Konstantin
On Fri, Sep 23, 2011 at 5:34 PM, Roman Shaposhnik <rv...@apache.org> wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
>> == TESTING ==
>> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
>> great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
> 1. do we know if the API delta between .22 and .23 is as
> significant as betwen
> .22 and .20.2?
>
> 2. what's the common approach downstream to support multiple versions of
> Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
>
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
Excellent point! In fact, let me hook these jobs to Bigtop's Jenkis so that this
info gets to be seen by anybody who wants to. I'll take care of it tomorrow.
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
Excellent point! In fact, let me hook these jobs to Bigtop's Jenkis so that this
info gets to be seen by anybody who wants to. I'll take care of it tomorrow.
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
>
> Do you have any build links or some such to point to?
Sorry. Took me a bit longer to hook up everything to our jenkins. Here's the
URL for the matrix job that is trying to compile everything in Bigtop against
Hadoop 0.22:
http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/
I disabled HBase for now, since it is compiling perfect and I don't want to
waste time doing it.
Otherwise -- it would extremely nice if mapreduce folks can suggest patches
to make these things compile. Things like these:
http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=sqoop,label=centos5/2/console
http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=mahout,label=centos5/2/console
Must be pretty trivial.
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say let's take a look at how bad are the problems; what are discrepancies?
>
> Do you have any build links or some such to point to?
Sorry. Took me a bit longer to hook up everything to our jenkins. Here's the
URL for the matrix job that is trying to compile everything in Bigtop against
Hadoop 0.22:
http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/
I disabled HBase for now, since it is compiling perfect and I don't want to
waste time doing it.
Otherwise -- it would extremely nice if mapreduce folks can suggest patches
to make these things compile. Things like these:
http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=sqoop,label=centos5/2/console
http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop22/COMPONENT=mahout,label=centos5/2/console
Must be pretty trivial.
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Konstantin Boudnik <co...@apache.org>.
I'd say let's take a look at how bad are the problems; what are discrepancies?
Do you have any build links or some such to point to?
Cos
On Fri, Sep 23, 2011 at 05:34PM, Roman Shaposhnik wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
> > == TESTING ==
> > 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> > great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
> 1. do we know if the API delta between .22 and .23 is as
> significant as betwen
> .22 and .20.2?
>
> 2. what's the common approach downstream to support multiple versions of
> Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Konstantin Shvachko <sh...@gmail.com>.
Good news Roman!
About connecting with other projects.
HBase is compiling with 0.22.
For Pig there is
https://issues.apache.org/jira/browse/PIG-2277
For Hive created
https://issues.apache.org/jira/browse/HIVE-2468
The direction with Hive and Pig is to create shim layers for different versions.
Don't know about the API delta between .22 and .23 yet. I assume it is
less than 0.20 vs 0.22. But I may be wrong.
--Konstantin
On Fri, Sep 23, 2011 at 5:34 PM, Roman Shaposhnik <rv...@apache.org> wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
>> == TESTING ==
>> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
>> great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
> 1. do we know if the API delta between .22 and .23 is as
> significant as betwen
> .22 and .20.2?
>
> 2. what's the common approach downstream to support multiple versions of
> Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
>
Re: Rejuvenate Hadoop 0.22 effort
Posted by Konstantin Boudnik <co...@apache.org>.
I'd say let's take a look at how bad are the problems; what are discrepancies?
Do you have any build links or some such to point to?
Cos
On Fri, Sep 23, 2011 at 05:34PM, Roman Shaposhnik wrote:
> On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
> > == TESTING ==
> > 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> > great. Will be glad to see any steps in this direction.
>
> The basic integration is done. We can produce fully functional RPM
> and DEB packages for Hadoop 0.22 release.
>
> This is good news. The bad news is that very few downstream components
> can be compiled against .22. And I'm not talking changes to versions, pom.xml
> and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
> all need to be modified to support .22. Before that's done -- there's
> little that
> can be done as far as stack validation is concerned.
>
> Given that work needs to be done in downstream components, I've got 2 questions:
> 1. do we know if the API delta between .22 and .23 is as
> significant as betwen
> .22 and .20.2?
>
> 2. what's the common approach downstream to support multiple versions of
> Hadoop APIs? Or is this even something that can be asked of all the
> components?
>
> Thanks,
> Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> == TESTING ==
> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> great. Will be glad to see any steps in this direction.
The basic integration is done. We can produce fully functional RPM
and DEB packages for Hadoop 0.22 release.
This is good news. The bad news is that very few downstream components
can be compiled against .22. And I'm not talking changes to versions, pom.xml
and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
all need to be modified to support .22. Before that's done -- there's
little that
can be done as far as stack validation is concerned.
Given that work needs to be done in downstream components, I've got 2 questions:
1. do we know if the API delta between .22 and .23 is as
significant as betwen
.22 and .20.2?
2. what's the common approach downstream to support multiple versions of
Hadoop APIs? Or is this even something that can be asked of all the
components?
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Roman Shaposhnik <rv...@apache.org>.
On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> == TESTING ==
> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is
> great. Will be glad to see any steps in this direction.
The basic integration is done. We can produce fully functional RPM
and DEB packages for Hadoop 0.22 release.
This is good news. The bad news is that very few downstream components
can be compiled against .22. And I'm not talking changes to versions, pom.xml
and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout
all need to be modified to support .22. Before that's done -- there's
little that
can be done as far as stack validation is concerned.
Given that work needs to be done in downstream components, I've got 2 questions:
1. do we know if the API delta between .22 and .23 is as
significant as betwen
.22 and .20.2?
2. what's the common approach downstream to support multiple versions of
Hadoop APIs? Or is this even something that can be asked of all the
components?
Thanks,
Roman.
Re: Rejuvenate Hadoop 0.22 effort
Posted by Konstantin Shvachko <sh...@gmail.com>.
Thanks Owen. Will definitely look at those.
--Konstantin
On Fri, Sep 16, 2011 at 9:10 AM, Owen O'Malley <ow...@hortonworks.com> wrote:
> Konst,
> You should take a look at the evaluate, not just the things that
> were marked as blockers 6 months ago, but also look at the things that
> have gone out in the 0.20.2xx line that aren't in 0.22.
>
> Areas of work that leap to mind:
> 1. fixes to the linux task controller.
> 2. rpm work
> 3. mr scheduler limits
> 4. capacity and fair share improvements
> 5. har improvements
>
> -- Owen
>
Re: Rejuvenate Hadoop 0.22 effort
Posted by Owen O'Malley <ow...@hortonworks.com>.
Konst,
You should take a look at the evaluate, not just the things that
were marked as blockers 6 months ago, but also look at the things that
have gone out in the 0.20.2xx line that aren't in 0.22.
Areas of work that leap to mind:
1. fixes to the linux task controller.
2. rpm work
3. mr scheduler limits
4. capacity and fair share improvements
5. har improvements
-- Owen