You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Suresh Srinivas <su...@hortonworks.com> on 2013/02/08 02:42:15 UTC

Heads up - merge branch-trunk-win to trunk

The support for Hadoop on Windows was proposed in
HADOOP-8079<https://issues.apache.org/jira/browse/HADOOP-8079> almost
a year ago. The goal was to make Hadoop natively integrated, full-featured,
and performance and scalability tuned on Windows Server or Windows Azure.
We are happy to announce that a lot of progress has been made in this
regard.

Initial work started in a feature branch, branch-1-win, based on branch-1.
The details related to the work done in the branch can be seen in
CHANGES.txt<http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup>.
This work has been ported to a branch, branch-trunk-win, based on trunk.
Merge patch for this is available on
HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
.

Highlights of the work done so far:
1. Necessary changes in Hadoop to run natively on Windows. These changes
handle differences in platforms related to path names, process/task
management etc.
2. Addition of winutils tools for managing file permissions and ownership,
user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
process/task management.
3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
start and stop scripts.
4. Addition of block placement policy implemnation to support cloud
enviroment, more specifically Azure.

We are very close to wrapping up the work in branch-trunk-win and getting
ready for a merge. Currently the merge patch is passing close to 100% of
unit tests on Linux. Soon I will call for a vote to merge this branch into
trunk.

Next steps:
1. Call for vote to merge branch-trunk-win to trunk, when the work
completes and precommit build is clean.
2. Start a discussion on adding Jenkins precommit builds on windows and how
to integrate that with the existing commit process.

Let me know if you have any questions.

Regards,
Suresh

Re: Heads up - merge branch-trunk-win to trunk

Posted by Suresh Srinivas <su...@hortonworks.com>.
I was planning to add details about the testing in the subsequent voting
thread.

As pointed out in the emails above, a lot of work is related to handling
difference
between the platforms related to paths and utilities. These changes lend
themselves
to be tested very well by the existing unit tests.

That said, a lot of testing has happened to validate the development done
in these
branches.

On branch-1-win, Hortonworks QA has been running a suite of comprehensive
system tests. This involves testing the branch with all the other stack
components,
such as Pig, Hive, HCat, HBase and Oozie. These tests have been done both
on
Linux and Windows. The testing has also been done both on JDK 6 and 7. In
addition,
as Mahadevan pointed out it has also been tested by early  customers with
production loads and  at scale.

branch-trunk-win has lot of code in common with branch-1-win and hence
benefits
from all the above testing. One place where additional work and testing was
done
in branch-trunk-win is related to YARN. All the MapReduce related work
loads have
been validated on single node cluster, cluster sizes with nodes upwards of
10.




On Thu, Feb 7, 2013 at 6:46 PM, Eli Collins <el...@cloudera.com> wrote:

> Thanks for the update Suresh.  Has any testing been done on the branch on
> Linux aside from running the unit tests?
>
> Thanks,
> Eli
>
>
> On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas <suresh@hortonworks.com
> >wrote:
>
> > The support for Hadoop on Windows was proposed in
> > HADOOP-8079<https://issues.apache.org/jira/browse/HADOOP-8079> almost
> > a year ago. The goal was to make Hadoop natively integrated,
> full-featured,
> > and performance and scalability tuned on Windows Server or Windows Azure.
> > We are happy to announce that a lot of progress has been made in this
> > regard.
> >
> > Initial work started in a feature branch, branch-1-win, based on
> branch-1.
> > The details related to the work done in the branch can be seen in
> > CHANGES.txt<
> >
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup
> > >.
> > This work has been ported to a branch, branch-trunk-win, based on trunk.
> > Merge patch for this is available on
> > HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
> > .
> >
> > Highlights of the work done so far:
> > 1. Necessary changes in Hadoop to run natively on Windows. These changes
> > handle differences in platforms related to path names, process/task
> > management etc.
> > 2. Addition of winutils tools for managing file permissions and
> ownership,
> > user group mapping, hardlinks, symbolic links, chmod, disk utilization,
> and
> > process/task management.
> > 3. Added cmd scripts equivalent to existing shell scripts
> hadoop-daemon.sh,
> > start and stop scripts.
> > 4. Addition of block placement policy implemnation to support cloud
> > enviroment, more specifically Azure.
> >
> > We are very close to wrapping up the work in branch-trunk-win and getting
> > ready for a merge. Currently the merge patch is passing close to 100% of
> > unit tests on Linux. Soon I will call for a vote to merge this branch
> into
> > trunk.
> >
> > Next steps:
> > 1. Call for vote to merge branch-trunk-win to trunk, when the work
> > completes and precommit build is clean.
> > 2. Start a discussion on adding Jenkins precommit builds on windows and
> how
> > to integrate that with the existing commit process.
> >
> > Let me know if you have any questions.
> >
> > Regards,
> > Suresh
> >
>



-- 
http://hortonworks.com/download/

Re: Heads up - merge branch-trunk-win to trunk

Posted by Eli Collins <el...@cloudera.com>.
Thanks for the update Suresh.  Has any testing been done on the branch on
Linux aside from running the unit tests?

Thanks,
Eli


On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> The support for Hadoop on Windows was proposed in
> HADOOP-8079<https://issues.apache.org/jira/browse/HADOOP-8079> almost
> a year ago. The goal was to make Hadoop natively integrated, full-featured,
> and performance and scalability tuned on Windows Server or Windows Azure.
> We are happy to announce that a lot of progress has been made in this
> regard.
>
> Initial work started in a feature branch, branch-1-win, based on branch-1.
> The details related to the work done in the branch can be seen in
> CHANGES.txt<
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup
> >.
> This work has been ported to a branch, branch-trunk-win, based on trunk.
> Merge patch for this is available on
> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
> .
>
> Highlights of the work done so far:
> 1. Necessary changes in Hadoop to run natively on Windows. These changes
> handle differences in platforms related to path names, process/task
> management etc.
> 2. Addition of winutils tools for managing file permissions and ownership,
> user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
> process/task management.
> 3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
> start and stop scripts.
> 4. Addition of block placement policy implemnation to support cloud
> enviroment, more specifically Azure.
>
> We are very close to wrapping up the work in branch-trunk-win and getting
> ready for a merge. Currently the merge patch is passing close to 100% of
> unit tests on Linux. Soon I will call for a vote to merge this branch into
> trunk.
>
> Next steps:
> 1. Call for vote to merge branch-trunk-win to trunk, when the work
> completes and precommit build is clean.
> 2. Start a discussion on adding Jenkins precommit builds on windows and how
> to integrate that with the existing commit process.
>
> Let me know if you have any questions.
>
> Regards,
> Suresh
>

Re: Heads up - merge branch-trunk-win to trunk

Posted by Eli Collins <el...@cloudera.com>.
Thanks for the update Suresh.  Has any testing been done on the branch on
Linux aside from running the unit tests?

Thanks,
Eli


On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> The support for Hadoop on Windows was proposed in
> HADOOP-8079<https://issues.apache.org/jira/browse/HADOOP-8079> almost
> a year ago. The goal was to make Hadoop natively integrated, full-featured,
> and performance and scalability tuned on Windows Server or Windows Azure.
> We are happy to announce that a lot of progress has been made in this
> regard.
>
> Initial work started in a feature branch, branch-1-win, based on branch-1.
> The details related to the work done in the branch can be seen in
> CHANGES.txt<
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup
> >.
> This work has been ported to a branch, branch-trunk-win, based on trunk.
> Merge patch for this is available on
> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
> .
>
> Highlights of the work done so far:
> 1. Necessary changes in Hadoop to run natively on Windows. These changes
> handle differences in platforms related to path names, process/task
> management etc.
> 2. Addition of winutils tools for managing file permissions and ownership,
> user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
> process/task management.
> 3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
> start and stop scripts.
> 4. Addition of block placement policy implemnation to support cloud
> enviroment, more specifically Azure.
>
> We are very close to wrapping up the work in branch-trunk-win and getting
> ready for a merge. Currently the merge patch is passing close to 100% of
> unit tests on Linux. Soon I will call for a vote to merge this branch into
> trunk.
>
> Next steps:
> 1. Call for vote to merge branch-trunk-win to trunk, when the work
> completes and precommit build is clean.
> 2. Start a discussion on adding Jenkins precommit builds on windows and how
> to integrate that with the existing commit process.
>
> Let me know if you have any questions.
>
> Regards,
> Suresh
>

RE: Heads up - merge branch-trunk-win to trunk

Posted by Mahadevan Venkatraman <ma...@microsoft.com>.
It is super exciting to look at the prospect of these changes being merged to trunk. Having Windows as one of the supported Hadoop platforms is a fantastic opportunity both for the Hadoop project and Microsoft customers.

This work began around a year back when a few of us started with a basic port of Hadoop on Windows. Ever since, the Hadoop team in Microsoft have made significant progress in the following areas:
(PS: Some of these items are already included in Suresh's email, but including again for completeness)

- Command-line scripts for the Hadoop surface area
- Mapping the HDFS permissions model to Windows
- Abstracted and reconciled mismatches around differences in Path semantics in Java and Windows
- Native Task Controller for Windows 
- Implementation of a Block Placement Policy to support cloud environments, more specifically Azure.
- Implementation of Hadoop native libraries for Windows (compression codecs, native I/O) - Several reliability issues, including race-conditions, intermittent test failures, resource leaks.
- Several new unit test cases written for the above changes

In the process, we have closely engaged with the Apache open source community and have got great support and assistance from the community in terms of contributing fixes, code review comments and commits. 

In addition, the Hadoop team at Microsoft has also made good progress in other projects including Hive, Pig, Sqoop, Oozie, HCat and HBase. Many of these changes have already been committed to the respective trunks with help from various committers and contributors. It is great to see the commitment of the community to support multiple platforms, and we look forward to the day when a developer/customer is able to successfully deploy a complete solution stack based on Apache Hadoop releases.

Next Steps:

All of the above changes are part of the Windows Azure HDInsight and HDInsight Server products from Microsoft. We have successfully on-boarded several internal customers and have been running production workloads on Windows Azure HDInsight. Our vision is to create a big data platform based on Hadoop, and we are committed to helping make Hadoop a world-class solution that anyone can use to solve their biggest data challenges. 

As an immediate next step, we would like to have a discussion around how we can ensure that the quality of the mainline Hadoop branches on Windows is maintained. To this end, we would like to get to the state where we have pre-checkin validation gates and nightly test runs enabled on Windows. If you have any suggestions around this, please do send an email.  We are committed to helping sustain the long-term quality of Hadoop on both Linux and Windows.

We sincerely thank the community for their contribution and support so far. And hope to continue having a close engagement in the future.

-Microsoft HDInsight Team


-----Original Message-----
From: Suresh Srinivas [mailto:suresh@hortonworks.com] 
Sent: Thursday, February 7, 2013 5:42 PM
To: common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
Subject: Heads up - merge branch-trunk-win to trunk

The support for Hadoop on Windows was proposed in HADOOP-8079<https://issues.apache.org/jira/browse/HADOOP-8079> almost a year ago. The goal was to make Hadoop natively integrated, full-featured, and performance and scalability tuned on Windows Server or Windows Azure.
We are happy to announce that a lot of progress has been made in this regard.

Initial work started in a feature branch, branch-1-win, based on branch-1.
The details related to the work done in the branch can be seen in CHANGES.txt<http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup>.
This work has been ported to a branch, branch-trunk-win, based on trunk.
Merge patch for this is available on
HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
.

Highlights of the work done so far:
1. Necessary changes in Hadoop to run natively on Windows. These changes handle differences in platforms related to path names, process/task management etc.
2. Addition of winutils tools for managing file permissions and ownership, user group mapping, hardlinks, symbolic links, chmod, disk utilization, and process/task management.
3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh, start and stop scripts.
4. Addition of block placement policy implemnation to support cloud enviroment, more specifically Azure.

We are very close to wrapping up the work in branch-trunk-win and getting ready for a merge. Currently the merge patch is passing close to 100% of unit tests on Linux. Soon I will call for a vote to merge this branch into trunk.

Next steps:
1. Call for vote to merge branch-trunk-win to trunk, when the work completes and precommit build is clean.
2. Start a discussion on adding Jenkins precommit builds on windows and how to integrate that with the existing commit process.

Let me know if you have any questions.

Regards,
Suresh


Re: Heads up - merge branch-trunk-win to trunk

Posted by Eli Collins <el...@cloudera.com>.
Thanks for the update Suresh.  Has any testing been done on the branch on
Linux aside from running the unit tests?

Thanks,
Eli


On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> The support for Hadoop on Windows was proposed in
> HADOOP-8079<https://issues.apache.org/jira/browse/HADOOP-8079> almost
> a year ago. The goal was to make Hadoop natively integrated, full-featured,
> and performance and scalability tuned on Windows Server or Windows Azure.
> We are happy to announce that a lot of progress has been made in this
> regard.
>
> Initial work started in a feature branch, branch-1-win, based on branch-1.
> The details related to the work done in the branch can be seen in
> CHANGES.txt<
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup
> >.
> This work has been ported to a branch, branch-trunk-win, based on trunk.
> Merge patch for this is available on
> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
> .
>
> Highlights of the work done so far:
> 1. Necessary changes in Hadoop to run natively on Windows. These changes
> handle differences in platforms related to path names, process/task
> management etc.
> 2. Addition of winutils tools for managing file permissions and ownership,
> user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
> process/task management.
> 3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
> start and stop scripts.
> 4. Addition of block placement policy implemnation to support cloud
> enviroment, more specifically Azure.
>
> We are very close to wrapping up the work in branch-trunk-win and getting
> ready for a merge. Currently the merge patch is passing close to 100% of
> unit tests on Linux. Soon I will call for a vote to merge this branch into
> trunk.
>
> Next steps:
> 1. Call for vote to merge branch-trunk-win to trunk, when the work
> completes and precommit build is clean.
> 2. Start a discussion on adding Jenkins precommit builds on windows and how
> to integrate that with the existing commit process.
>
> Let me know if you have any questions.
>
> Regards,
> Suresh
>

Re: Heads up - merge branch-trunk-win to trunk

Posted by Eli Collins <el...@cloudera.com>.
Thanks for the update Suresh.  Has any testing been done on the branch on
Linux aside from running the unit tests?

Thanks,
Eli


On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> The support for Hadoop on Windows was proposed in
> HADOOP-8079<https://issues.apache.org/jira/browse/HADOOP-8079> almost
> a year ago. The goal was to make Hadoop natively integrated, full-featured,
> and performance and scalability tuned on Windows Server or Windows Azure.
> We are happy to announce that a lot of progress has been made in this
> regard.
>
> Initial work started in a feature branch, branch-1-win, based on branch-1.
> The details related to the work done in the branch can be seen in
> CHANGES.txt<
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup
> >.
> This work has been ported to a branch, branch-trunk-win, based on trunk.
> Merge patch for this is available on
> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
> .
>
> Highlights of the work done so far:
> 1. Necessary changes in Hadoop to run natively on Windows. These changes
> handle differences in platforms related to path names, process/task
> management etc.
> 2. Addition of winutils tools for managing file permissions and ownership,
> user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
> process/task management.
> 3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
> start and stop scripts.
> 4. Addition of block placement policy implemnation to support cloud
> enviroment, more specifically Azure.
>
> We are very close to wrapping up the work in branch-trunk-win and getting
> ready for a merge. Currently the merge patch is passing close to 100% of
> unit tests on Linux. Soon I will call for a vote to merge this branch into
> trunk.
>
> Next steps:
> 1. Call for vote to merge branch-trunk-win to trunk, when the work
> completes and precommit build is clean.
> 2. Start a discussion on adding Jenkins precommit builds on windows and how
> to integrate that with the existing commit process.
>
> Let me know if you have any questions.
>
> Regards,
> Suresh
>