You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Alexander Stojanovic (Created) (JIRA)" <ji...@apache.org> on 2012/02/16 01:59:00 UTC

[jira] [Created] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
-------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-8079
                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
             Project: Hadoop Common
          Issue Type: Improvement
          Components: native
    Affects Versions: 1.0.0
            Reporter: Alexander Stojanovic
             Fix For: 1.0.0


This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
 
The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 

Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.

In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.

Your feedback solicited,
 
Alexander Stojanovic
Min Wei
David Lao
Lengning Liu
David Zhang
Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "eric baldeschwieler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209057#comment-13209057 ] 

eric baldeschwieler commented on HADOOP-8079:
---------------------------------------------

+1 looking forward to seeing hadoop run in more places for more people!
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Milind Bhandarkar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209539#comment-13209539 ] 

Milind Bhandarkar commented on HADOOP-8079:
-------------------------------------------

+1 Looking forward to these patches, and getting rid of cygwin requirement.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228153#comment-13228153 ] 

Sanjay Radia commented on HADOOP-8079:
--------------------------------------

@Eli>Why not do this trunk first like we do with other new features? branch-1 is the sustaining branch.

Branch 1-win is just being used as a proof of concept for the patches. The trunk patches are expected to be provided and checked into trunk before this jira is completed.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215871#comment-13215871 ] 

Mahadev konar commented on HADOOP-8079:
---------------------------------------

Thanks David.
Can you please upload the patches to respective jiras? Eg: windows-cmd-scripts.patch to https://issues.apache.org/jira/browse/HADOOP-8103. 

Also note that you'll have to grant license to Apache for inclusion. You'll see this option when you try uploading a patch.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Steve Loughran (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217316#comment-13217316 ] 

Steve Loughran commented on HADOOP-8079:
----------------------------------------

Overall, it's a good initial start, though it could be made a bit more elegant and easier to test.

Testing is what worries me here, as even if the release process & Jenkins test on Windows, there's no guarantee anyone else will, which increases the likelihood of a regression sneaking in. The smaller amount of platform-specific code the better
* Incomplete full use of ASF guidelines; all if() clauses should be curly braced for better long-term maintenance esp. w/ patches.
* Some of the changes seem IDE-triggered, not OS-related; these should be removed as they complicate other patches and versions.
* I'm not sure about "temp hack to copy file" comment above a method in {{FileUtil}}; it's a bit worrying.
* Even when exceptions are swallowed, a log at debug level is wise. Just in case something really, really unexpected happens.
* The patches imply that cygwin will never be used again. Is this something everyone is happy with? I don't personally have any...
* I'm curious why the SymLink code opts to copy a file instead of using {{::CreateSymbolicLink()}}; I assume that an extended {{org.apache.hadoop.fs.HardLink}} class will also avoid {{::CreateHardLink()}}. I know these aren't exported via the Java runtime, but is there no way they could be invoked by executing something? If that's not possible, then this is a good time to add {{ln}} to the windows command line.
* {{stop-slave.cmd}} and its siblings use the phrase "Microsoft Hadoop Distribution"
This should not be in the ASF source, and will fall foul of the ASF trademark rules were it to be used in products not released by the ASF

This is a good opportunity to do better abstraction and so make it possible to test a lot of the abstraction behaviour (e.g. the file copying), even on Linux, so ensuring that test coverage is higher across all platforms. For example, there is a lot of snippets like

{code}
  String[] shellCmd = {(Path.WINDOWS)?"cmd":"bash", 
 (Path.WINDOWS)?"/c":"-c", untarCommand.toString() };
{code}
And
{code}

    return (WINDOWS)? new String[]{"cmd", "/c", "df -k " + dirPath + " 2>nul"}:
        new String[] {"bash","-c","exec 'df' '-k' '" + dirPath + "' 2>/dev/null"};
   }
{code}

I could imagine something to generate a bash command or a wincommand that takes a list of args

{code}
String bashCommand(String[] args) {
  String[] command = new String[args+2];
  command[0] = "bash";
  command[1] = "-c";
  //array copy here
  return command;
}
  

String winCommand(String[] args) {
  String[] command = new String[args+2];
  command[0] = "cmd";
  command[1] = "/c";
  //array copy here
  return command;
}

String command(String[] args) {
  return (!WINDOWS) bashCommand(args): winCommand(args);
}
{code}


Similarly, {{quietBashCommand}} and {{quietWinCommand()}} would set up the null output. You could test at the low level bash/win command generation and very that what you got is what is expected; unit tests for all platforms.


  
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Steve Loughran (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227506#comment-13227506 ] 

Steve Loughran commented on HADOOP-8079:
----------------------------------------

Another nice feature would be for Win32/64 versions of the native libraries to be available alongside the Hadoop releases (and an OS/X version too). Integrating this into the main release process would be tricky, but a Windows VM with optimising 32 and 64 bit compilers could be used to do a simultaneous release
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Aaron T. Myers (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron T. Myers updated HADOOP-8079:
-----------------------------------

    Target Version/s: 0.24.0, 1.1.0  (was: 1.1.0)

Got it. Thanks for the explanation. I've also added a target version of 0.24.0, which corresponds to trunk.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Alexander Stojanovic (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209081#comment-13209081 ] 

Alexander Stojanovic commented on HADOOP-8079:
----------------------------------------------

Sanjay:

I think the idea of breaking down the work into sub-JIRAs to ease discussion and review is the right way to go. We will post a full patch set to kick off the process and please feel free to suggest sub-JIRAs. Thanks (in advance).

                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Bikas Saha (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230407#comment-13230407 ] 

Bikas Saha commented on HADOOP-8079:
------------------------------------

I applied these patches and ran the tests. Looks like some tests are failing.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Aaron T. Myers (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron T. Myers updated HADOOP-8079:
-----------------------------------

    Target Version/s: 1.1.0  (was: 1.0.0)
       Fix Version/s:     (was: 1.0.0)

Hi Alexander, the "fix version" field should only be set once the change has been committed and the JIRA resolved. Thus, I've removed that field.

Since 1.0.0 has already been released, it's not an appropriate "target version." I've changed the target version to 1.1.0.

One question I have - you say in the description that you've developed the patch set against Hadoop 1.0, but that you'd like to refine the patch set until it can be committed to Hadoop trunk. Is the intention to commit this both to branch-1 and trunk? Or just trunk?
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "David Lao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Lao updated HADOOP-8079:
------------------------------

    Attachment: hadoopcmdscripts.zip

Windows command scripts for Hadoop 1.0.0
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, microsoft-windowsazure-api-0.1.2.jar
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated HADOOP-8079:
-------------------------------

    Target Version/s: 1.2.0, 0.24.0  (was: 1.1.0, 0.24.0)
    
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079-branch-1-win.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Steve Loughran (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227445#comment-13227445 ] 

Steve Loughran commented on HADOOP-8079:
----------------------------------------

Regarding the {{org.apache.hadoop.fs.azurenative}} classes

* keys like {{"fs.azure.buffer.dir"}} need to be pulled out and made constants; the embedding of strings is something the main codebase is slowly moving away from. Some of the code does this, but not all.
* The code depends on microsoft-windowsazure-api 1.2.0 , which is in the maven repository. There's also a 0.2.0 version in there -any particular reason for not using the latest release?
* Testing? How is anyone working with this code going to use the fs. Is there S3-style remote access, or do you have to bring up a VM in the cluster?
* The catch of {{Exception}} and wrapping with {{AzureException}} is best set up so that {{IOException}} exceptions aren't caught and wrapped, as they match the signature. I don't know if the native API throws these, but adding an extra layer of nesting never helps with troubleshooting live systems.

It may be cleaner to keep the azure FS source tree outside the main hadoop code, and host it in a parallel hadoop-azurefs project with the extra dependency, and the extra output artifacts. Anyone who added a mvn or ivy dependency on hadoop-azurefs would get the -api JAR, and testing could be isolated. This could also be a good opportunity to do the same for KFS, which is under-tested in the current release process, and for any other DFS clients that people want in the codebase. Maybe the policy should be: if it is testable by anyone, put it in the hadoop source tree, but if not, the FS vendor has to do it. (I'm thinking of things like GPFS here and others, not just AzureFS)
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "David Lao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Lao updated HADOOP-8079:
------------------------------

    Attachment: windows-cmd-scripts.patch
                mapred-tasks.patch
                general-utils-windows.patch
                security.patch

Add patches to match sub-JIRAs breakdown.  
security.patch -> 1. Security
general-utils-windows.patch -> 2. General Utils - DU, DF, windows shell
mapred-tasks.patch -> 3. Interfacing with OS to make MR tasks
windows-cmd-scripts.patch -> cmd scripts
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Alexander Stojanovic (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209049#comment-13209049 ] 

Alexander Stojanovic commented on HADOOP-8079:
----------------------------------------------

Hi Aaron, thank you for the fix-ups. That is appreciated. 

It is totally makes sense to designate 1.1.0 the target version. We have been developing the patch against the 1.0. 

To your other question - yes, the goal is to commit to both the branch and trunk once the community discussion, feedback incorporation, and review processes have been completed to the community's satisfaction.

--Alexander
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213873#comment-13213873 ] 

Sanjay Radia commented on HADOOP-8079:
--------------------------------------

After going through the patch here is a proposal for  jira breakdown
# Security 
src/core/org/apache/hadoop/io/SecureIOUtils.java
src/core/org/apache/hadoop/security/UserGroupInformation.java
src/core/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java	
src/core/org/apache/hadoop/security/Credentials.java
src/core/org/apache/hadoop/fs/RawLocalFileSystem.java
src/core/org/apache/hadoop/util/ProcessTree.java
# General Utils  - DU, DF, windows shell 
src/core/org/apache/hadoop/fs/DU.java
src/core/org/apache/hadoop/fs/DUHelper.java
src/core/org/apache/hadoop/fs/DF.java
src/core/org/apache/hadoop/fs/FileUtil.java
src/core/org/apache/hadoop/util/Shell.java
# Interfacing with OS to make MR tasks
src/mapred/org/apache/hadoop/mapred/TaskController.java
src/mapred/org/apache/hadoop/mapred/Child.java
src/mapred/org/apache/hadoop/mapred/JvmManager.java
src/mapred/org/apache/hadoop/mapred/TaskRunner.java
src/mapred/org/apache/hadoop/mapred/TaskTracker.java
src/mapred/org/apache/hadoop/mapred/ReduceTask.java
src/mapred/org/apache/hadoop/mapred/TaskLog.java
src/mapred/org/apache/hadoop/mapred/DefaultTaskController.java
# Azure file system support
The azure patches  will go here.


                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, microsoft-windowsazure-api-0.1.2.jar
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Alexander Stojanovic (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214851#comment-13214851 ] 

Alexander Stojanovic commented on HADOOP-8079:
----------------------------------------------

Sanjay, the taxonomy you propose is pragmatic and reasonable. I believe that it helps segment the changes in the right way to ease discussion and review.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, microsoft-windowsazure-api-0.1.2.jar
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216999#comment-13216999 ] 

Eli Collins commented on HADOOP-8079:
-------------------------------------

Why not do this trunk first like we do with other new features? branch-1 is the sustaining branch..
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Min Wei (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Wei updated HADOOP-8079:
----------------------------

    Attachment: microsoft-windowsazure-api-0.1.2.jar
                azurenative.zip
                hadoop-8079.AzureBlobStore.patch

The latest Windows Azure Java Client is available at: 

http://msdn.microsoft.com/en-us/library/windowsazure/hh690953(v=vs.103).aspx
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, microsoft-windowsazure-api-0.1.2.jar
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216234#comment-13216234 ] 

Mahadev konar commented on HADOOP-8079:
---------------------------------------

I just create a branch-1-win on svn. Ill try and create a windows build within the next couple of days.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "David Lao (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240217#comment-13240217 ] 

David Lao commented on HADOOP-8079:
-----------------------------------

Add patch for the branch-1-win branch. The patch includes changes for all the sub-JIRAs. Note this is work in progress. Tests affected by these changes are still under review and test patches are forthcoming.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079-branch-1-win.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Min Wei (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Wei updated HADOOP-8079:
----------------------------

    Attachment: hadoop-8079.patch
    
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: hadoop-8079.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237261#comment-13237261 ] 

Sanjay Radia commented on HADOOP-8079:
--------------------------------------

A fair number of tests are failing. I suggest that the team works on a "commit-then-review" in the branch-1-win and iterate to improve the solution and fix tests to get a working branch. Comments in the jiras will be addressed. Following that the team posts a set of small trunk-patches to make it convenient for review.


                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213785#comment-13213785 ] 

Sanjay Radia commented on HADOOP-8079:
--------------------------------------

Looked at hadoop-8079.patch - quite small.
Did not see any windows commands corresponding for the bin/hadoop series of bash scripts. 
Did you forget to attach those files or forgot to do "svn add" or "git add"  before generating the patch?
 
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, microsoft-windowsazure-api-0.1.2.jar
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215817#comment-13215817 ] 

Mahadev konar commented on HADOOP-8079:
---------------------------------------

@Sanjay,
 Thanks for creating the sub jiras. I am going to create a branch 1.0-win off the 1.0 branch so that we can quickly iterate on the patches on a branch and then see if it all falls into place. It'll also help setup a windows build on that branch as well so that folks can take a look at it.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, microsoft-windowsazure-api-0.1.2.jar
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209069#comment-13209069 ] 

Sanjay Radia commented on HADOOP-8079:
--------------------------------------

This is great for Hadoop - it expands the set of platforms and market for Hadoop.

I also suggest that we break the work down into sub-jiras so that the community can review smaller chunks. If you post a full patch set I can suggest sub-jiras.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Steve Loughran (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227441#comment-13227441 ] 

Steve Loughran commented on HADOOP-8079:
----------------------------------------

I was just thinking that it could be done more cleanly even inside Shell, as that does get subclassed, and issues like MAPREDUCE-3967 show that all uses of bash need to be isolated and reviewed -ideally tested with unit tests that can generate the windows strings even on linux boxes.
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "David Lao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Lao updated HADOOP-8079:
------------------------------

    Attachment: hadoop-8079-branch-1-win.patch
    
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079-branch-1-win.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220490#comment-13220490 ] 

Sanjay Radia commented on HADOOP-8079:
--------------------------------------

While what you are suggesting helps, the code changes to 
{code}
 command( WINDOWS ? foo-windows : foo-bash)
{code}

Many (but not all) commands are inside Shell.java and fairly isolated from rest of code. An example is 
{code}
  public static String[] getGroupsForUserCommand(final String user) {
    //'groups username' command return is non-consistent across different unixes
    return (WINDOWS)? new String[] {"cmd", "/c", "id -Gn " + user}:
        new String [] {"bash", "-c", "id -Gn " + user};
  }
{code}
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar, security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

Posted by "Alexander Stojanovic (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213749#comment-13213749 ] 

Alexander Stojanovic commented on HADOOP-8079:
----------------------------------------------

We have uploaded the patch for Azure Storage (i.e. XStore) support within HDFS. Our goal has been to enable Azure Storage to be used in an analogous fashion to AWS's S3. This is designed to supplement intra-cluster "local" HDFS in our Hadoop on Azure service. 
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, hadoop-8079.AzureBlobStore.patch, hadoop-8079.patch, microsoft-windowsazure-api-0.1.2.jar
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception, but Windows support has never been a priority. Currently Windows works as a development and testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance and scalability tuned for Windows Server or Windows Azure.  We would like to change this and engage in a dialog with the broader community on the architectural design points for making Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.  
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch set that addresses these performance, integration and feature gaps, allowing Apache Hadoop to be used with Azure and Windows Server without recourse to virtualization technologies such as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant, PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms, including Windows, we propose first contributing this work to the Apache community by attaching it to this JIRA.  From there we would like to work with the community to refine the patch set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira